Simon Willison's Weblog: cloudflare

DNS Lookup

2026-03-22T19:16:30+00:00

TIL that Cloudflare's 1.1.1.1 DNS service (and 1.1.1.2 and 1.1.1.3, which block malware and malware + adult content respectively) has a CORS-enabled JSON API, so I had Claude Code build me a UI for running DNS queries against all three of those resolvers.

Tags: dns, cors, cloudflare

tldraw issue: Move tests to closed source repo

2026-02-25T21:06:53+00:00

tldraw issue: Move tests to closed source repo

It's become very apparent over the past few months that a comprehensive test suite is enough to build a completely fresh implementation of any open source library from scratch, potentially in a different language.

This has worrying implications for open source projects with commercial business models. Here's an example of a response: tldraw, the outstanding collaborative drawing library (see previous coverage), are moving their test suite to a private repository - apparently in response to Cloudflare's project to port Next.js to use Vite in a week using AI.

They also filed a joke issue, now closed to Translate source code to Traditional Chinese:

The current tldraw codebase is in English, making it easy for external AI coding agents to replicate. It is imperative that we defend our intellectual property.

Worth noting that tldraw aren't technically open source - their custom license requires a commercial license if you want to use it in "production environments".

Update: Well this is embarrassing, it turns out the issue I linked to about removing the tests was a joke as well:

Sorry folks, this issue was more of a joke (am I allowed to do that?) but I'll keep the issue open since there's some discussion here. Writing from mobile

moving our tests into another repo would complicate and slow down our development, and speed for us is more important than ever

more canvas better, I know for sure that our decisions have inspired other products and that's fine and good

tldraw itself may eventually be a vibe coded alternative to tldraw

the value is in the ability to produce new and good product decisions for users / customers, however you choose to create the code

Via @steveruizok

Tags: open-source, cloudflare, ai-ethics

Adding dynamic features to an aggressively cached website

2026-01-28T22:10:08+00:00

My blog uses aggressive caching: it sits behind Cloudflare with a 15 minute cache header, which guarantees it can survive even the largest traffic spike to any given page. I've recently added a couple of dynamic features that work in spite of that full-page caching. Here's how those work.

Edit links that are visible only to me

This is a Django site and I manage it through the Django admin.

I have four types of content - entries, link posts (aka blogmarks), quotations and notes. Each of those has a different model and hence a different Django admin area.

I wanted an "edit" link on the public pages that was only visible to me.

The button looks like this:

I solved conditional display of this button with localStorage. I have a tiny bit of JavaScript which checks to see if the localStorage key ADMIN is set and, if it is, displays an edit link based on a data attribute:

document.addEventListener('DOMContentLoaded', () => {
  if (window.localStorage.getItem('ADMIN')) {
    document.querySelectorAll('.edit-page-link').forEach(el => {
      const url = el.getAttribute('data-admin-url');
      if (url) {
        const a = document.createElement('a');
        a.href = url;
        a.className = 'edit-link';
        a.innerHTML = '<svg>...</svg> Edit';
        el.appendChild(a);
        el.style.display = 'block';
      }
    });
  }
});

If you want to see my edit links you can run this snippet of JavaScript:

localStorage.setItem('ADMIN', '1');

My Django admin dashboard has a custom checkbox I can click to turn this option on and off in my own browser:

Those admin edit links are a very simple pattern. A more interesting one is a feature I added recently for navigating randomly within a tag.

Here's an animated GIF showing those random tag navigations in action (try it here):

On any of my blog's tag pages you can click the "Random" button to bounce to a random post with that tag. That random button then persists in the header of the page and you can click it to continue bouncing to random items in that same tag.

A post can have multiple tags, so there needs to be a little bit of persistent magic to remember which tag you are navigating and display the relevant button in the header.

Once again, this uses localStorage. Any click to a random button records both the tag and the current timestamp to the random_tag key in localStorage before redirecting the user to the /random/name-of-tag/ page, which selects a random post and redirects them there.

Any time a new page loads, JavaScript checks if that random_tag key has a value that was recorded within the past 5 seconds. If so, that random button is appended to the header.

This means that, provided the page loads within 5 seconds of the user clicking the button, the random tag navigation will persist on the page.

You can see the code for that here.

And the prompts

I built the random tag feature entirely using Claude Code for web, prompted from my iPhone. I started with the /random/TAG/ endpoint (full transcript):

Build /random/TAG/ - a page which picks a random post (could be an entry or blogmark or note or quote) that has that tag and sends a 302 redirect to it, marked as no-cache so Cloudflare does not cache it

Use a union to build a list of every content type (a string representing the table out of the four types) and primary key for every item tagged with that tag, then order by random and return the first one

Then inflate the type and ID into an object and load it and redirect to the URL

Include tests - it should work by setting up a tag with one of each of the content types and then running in a loop calling that endpoint until it has either returned one of each of the four types or it hits 1000 loops at which point fail with an error

Then:

I do not like that solution, some of my tags have thousands of items

Can we do something clever with a CTE?

Here's the something clever with a CTE solution we ended up with.

For the "Random post" button (transcript):

Look at most recent commit, then modify the /tags/xxx/ page to have a "Random post" button which looks good and links to the /random/xxx/ page

Then:

Put it before not after the feed icon. It should only display if a tag has more than 5 posts

And finally, the localStorage implementation that persists a random tag button in the header (transcript):

Review the last two commits. Make it so clicking the Random button on a tag page sets a localStorage value for random_tag with that tag and a timestamp. On any other page view that uses the base item template add JS that checks for that localStorage value and makes sure the timestamp is within 5 seconds. If it is within 5 seconds it adds a "Random name-of-tag" button to the little top navigation bar, styled like the original Random button, which bumps the localStorage timestamp and then sends the user to /random/name-of-tag/ when they click it. In this way clicking "Random" on a tag page will send the user into an experience where they can keep clicking to keep surfing randomly in that topic.

Tags: caching, django, javascript, localstorage, ai, cloudflare, generative-ai, llms, ai-assisted-programming

Using Claude in Chrome to navigate out the Cloudflare dashboard

2025-12-22T16:10:30+00:00

I just had my first success using a browser agent - in this case the Claude in Chrome extension - to solve an actual problem.

A while ago I set things up so anything served from the https://static.simonwillison.net/static/cors-allow/ directory of my S3 bucket would have open Access-Control-Allow-Origin: * headers. This is useful for hosting files online that can be loaded into web applications hosted on other domains.

Problem is I couldn't remember how I did it! I initially thought it was an S3 setting, but it turns out S3 lets you set CORS at the bucket-level but not for individual prefixes.

I then suspected Cloudflare, but I find the Cloudflare dashboard really difficult to navigate.

So I decided to give Claude in Chrome a go. I installed and enabled the extension (you then have to click the little puzzle icon and click "pin" next to Claude for the icon to appear, I had to ask Claude itself for help figuring that out), signed into Cloudflare, opened the Claude panel and prompted:

I'm trying to figure out how come all pages under http://static.simonwillison.net/static/cors/ have an open CORS policy, I think I set that up through Cloudflare but I can't figure out where

Off it went. It took 1m45s to find exactly what I needed.

Claude's conclusion:

Found it! Your open CORS policy for the /static/cors/ directory on static.simonwillison.net is configured through Cloudflare Transform Rules, specifically a Response Header Transform Rule named static.simonwillis.net/static/cors-allow/*

There's no "share transcript" option but I used copy and paste and two gnarly Claude Code sessions (one, two) to turn it into an HTML transcript which you can take a look at here.

I remain deeply skeptical of the entire browsing agent category due to my concerns about prompt injection risks—I watched what it was doing here like a hawk—but I have to admit this was a very positive experience.

Tags: anthropic, claude, browser-agents, cors, ai, llms, generative-ai, chrome, cloudflare, prompt-injection, ai-agents

Quoting Matthew Prince

2025-11-19T08:02:36+00:00

Cloudflare's network began experiencing significant failures to deliver core network traffic [...] triggered by a change to one of our database systems' permissions which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network. [...] The software had a limit on the size of the feature file that was below its doubled size. That caused the software to fail. [...]

This resulted in the following panic which in turn resulted in a 5xx error:

thread fl2_worker_thread panicked: called Result::unwrap() on an Err value

— Matthew Prince, Cloudflare outage on November 18, 2025, see also this comment

Tags: scaling, rust, cloudflare, postmortem

Video: Building a tool to copy-paste share terminal sessions using Claude Code for web

2025-10-23T04:14:08+00:00

This afternoon I was manually converting a terminal session into a shared HTML file for the umpteenth time when I decided to reduce the friction by building a custom tool for it - and on the spur of the moment I fired up Descript to record the process. The result is this new 11 minute YouTube video showing my workflow for vibe-coding simple tools from start to finish.

The initial problem

The problem I wanted to solve involves sharing my Claude Code CLI sessions - and the more general problem of sharing interesting things that happen in my terminal.

A while back I discovered (using my vibe-coded clipboard inspector) that copying and pasting from the macOS terminal populates a rich text clipboard format which preserves the colors and general formatting of the terminal output.

The problem is that format looks like this:

{\rtf1\ansi\ansicpg1252\cocoartf2859
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fnil\fcharset0 Monaco;}
{\colortbl;\red255\green255\blue255;\red242\green242\blue242;\red0\green0\blue0;\red204\green98\blue70;
\red0\green0\blue0;\red97\green97\blue97;\red102\green102\blue102;\red255\

This struck me as the kind of thing an LLM might be able to write code to parse, so I had ChatGPT take a crack at it and then later rewrote it from scratch with Claude Sonnet 4.5. The result was this rtf-to-html tool which lets you paste in rich formatted text and gives you reasonably solid HTML that you can share elsewhere.

To share that HTML I've started habitually pasting it into a GitHub Gist and then taking advantage of gitpreview.github.io, a neat little unofficial tool that accepts ?GIST_ID and displays the gist content as a standalone HTML page... which means you can link to rendered HTML that's stored in a gist.

So my process was:

Copy terminal output
Paste into rtf-to-html
Copy resulting HTML
Paste that int a new GitHub Gist
Grab that Gist's ID
Share the link to gitpreview.github.io?GIST_ID

Not too much hassle, but frustratingly manual if you're doing it several times a day.

The desired solution

Ideally I want a tool where I can do this:

Copy terminal output
Paste into a new tool
Click a button and get a gistpreview link to share

I decided to get Claude Code for web to build the entire thing.

The prompt

Here's the full prompt I used on claude.ai/code, pointed at my simonw/tools repo, to build the tool:

Build a new tool called terminal-to-html which lets the user copy RTF directly from their terminal and paste it into a paste area, it then produces the HTML version of that in a textarea with a copy button, below is a button that says "Save this to a Gist", and below that is a full preview. It will be very similar to the existing rtf-to-html.html tool but it doesn't show the raw RTF and it has that Save this to a Gist button

That button should do the same trick that openai-audio-output.html does, with the same use of localStorage and the same flow to get users signed in with a token if they are not already

So click the button, it asks the user to sign in if necessary, then it saves that HTML to a Gist in a file called index.html, gets back the Gist ID and shows the user the URL https://gistpreview.github.io/?6d778a8f9c4c2c005a189ff308c3bc47 - but with their gist ID in it

They can see the URL, they can click it (do not use target="_blank") and there is also a "Copy URL" button to copy it to their clipboard

Make the UI mobile friendly but also have it be courier green-text-on-black themed to reflect what it does

If the user pastes and the pasted data is available as HTML but not as RTF skip the RTF step and process the HTML directly

If the user pastes and it's only available as plain text then generate HTML that is just an open <pre> tag and their text and a closing </pre> tag

It's quite a long prompt - it took me several minutes to type! But it covered the functionality I wanted in enough detail that I was pretty confident Claude would be able to build it.

Combining previous tools

I'm using one key technique in this prompt: I'm referencing existing tools in the same repo and telling Claude to imitate their functionality.

I first wrote about this trick last March in Running OCR against PDFs and images directly in your browser, where I described how a snippet of code that used PDF.js and another snippet that used Tesseract.js was enough for Claude 3 Opus to build me this working PDF OCR tool. That was actually the tool that kicked off my tools.simonwillison.net collection in the first place, which has since grown to 139 and counting.

Here I'm telling Claude that I want the RTF to HTML functionality of rtf-to-html.html combined with the Gist saving functionality of openai-audio-output.html.

That one has quite a bit going on. It uses the OpenAI audio API to generate audio output from a text prompt, which is returned by that API as base64-encoded data in JSON.

Then it offers the user a button to save that JSON to a Gist, which gives the snippet a URL.

Another tool I wrote, gpt-4o-audio-player.html, can then accept that Gist ID in the URL and will fetch the JSON data and make the audio playable in the browser. Here's an example.

The trickiest part of this is API tokens. I've built tools in the past that require users to paste in a GitHub Personal Access Token (PAT) (which I then store in localStorage in their browser - I don't want other people's authentication credentials anywhere near my own servers). But that's a bit fiddly.

Instead, I figured out the minimal Cloudflare worker necessary to implement the server-side portion of GitHub's authentication flow. That code lives here and means that any of the HTML+JavaScript tools in my collection can implement a GitHub authentication flow if they need to save Gists.

But I don't have to tell the model any of that! I can just say "do the same trick that openai-audio-output.html does" and Claude Code will work the rest out for itself.

The result

Here's what the resulting app looks like after I've pasted in some terminal output from Claude Code CLI:

It's exactly what I asked for, and the green-on-black terminal aesthetic is spot on too.

Quoting Kenton Varda

2025-09-05T16:43:13+00:00

After struggling for years trying to figure out why people think [Cloudflare] Durable Objects are complicated, I'm increasingly convinced that it's just that they sound complicated.

Feels like we can solve 90% of it by renaming DurableObject to StatefulWorker?

It's just a worker that has state. And because it has state, it also has to have a name, so that you can route to the specific worker that has the state you care about. There may be a sqlite database attached, there may be a container attached. Those are just part of the state.

— Kenton Varda

Tags: sqlite, cloudflare, kenton-varda

Cloudflare Radar: AI Insights

2025-09-01T17:06:56+00:00

Cloudflare Radar: AI Insights

Cloudflare launched this dashboard back in February, incorporating traffic analysis from Cloudflare's network along with insights from their popular 1.1.1.1 DNS service.

I found this chart particularly interesting, showing which documented AI crawlers are most active collecting training data - lead by GPTBot, ClaudeBot and Meta-ExternalAgent:

Cloudflare's DNS data also hints at the popularity of different services. ChatGPT holds the first place, which is unsurprising - but second place is a hotly contested race between Claude and Perplexity and #4/#5/#6 is contested by GitHub Copilot, Perplexity, and Codeium/Windsurf.

Google Gemini comes in 7th, though since this is DNS based I imagine this is undercounting instances of Gemini on google.com as opposed to gemini.google.com.

Via Hacker News

Tags: crawling, dns, ai, cloudflare, generative-ai, llms

ChatGPT agent's user-agent

2025-08-04T22:49:25+00:00

I was exploring how ChatGPT agent works today. I learned some interesting things about how it exposes its identity through HTTP headers, then made a huge blunder in thinking it was leaking its URLs to Bingbot and Yandex... but it turned out that was a Cloudflare feature that had nothing to do with ChatGPT.

ChatGPT agent is the recently released (and confusingly named) ChatGPT feature that provides browser automation combined with terminal access as a feature of ChatGPT - replacing their previous Operator research preview which is scheduled for deprecation on August 31st.

Investigating ChatGPT agent's user-agent

I decided to dig into how it works by creating a logged web URL endpoint using django-http-debug. Then I told ChatGPT agent mode to explore that new page:

My logging captured these request headers:

Via: 1.1 heroku-router
Host: simonwillison.net
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Cf-Ray: 96a0f289adcb8e8e-SEA
Cookie: cf_clearance=zzV8W...
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Priority: u=0, i
Sec-Ch-Ua: "Not)A;Brand";v="8", "Chromium";v="138"
Signature: sig1=:1AxfqHocTf693inKKMQ7NRoHoWAZ9d/vY4D/FO0+MqdFBy0HEH3ZIRv1c3hyiTrzCvquqDC8eYl1ojcPYOSpCQ==:
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36
Cf-Ipcountry: US
X-Request-Id: 45ef5be4-ead3-99d5-f018-13c4a55864d3
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Accept-Encoding: gzip, br
Accept-Language: en-US,en;q=0.9
Signature-Agent: "https://chatgpt.com"
Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1754340838;keyid="otMqcjr17mGyruktGvJU8oojQTSMHlVm7uO-lrcqbdg";expires=1754344438;nonce="_8jbGwfLcgt_vUeiZQdWvfyIeh9FmlthEXElL-O2Rq5zydBYWivw4R3sV9PV-zGwZ2OEGr3T2Pmeo2NzmboMeQ";tag="web-bot-auth";alg="ed25519"
X-Forwarded-For: 2a09:bac5:665f:1541::21e:154, 172.71.147.183
X-Request-Start: 1754340840059
Cf-Connecting-Ip: 2a09:bac5:665f:1541::21e:154
Sec-Ch-Ua-Mobile: ?0
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Sec-Ch-Ua-Platform: "Linux"
Upgrade-Insecure-Requests: 1

That Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36 user-agent header is the one used by the most recent Chrome on macOS - which is a little odd here as the Sec-Ch-Ua-Platform : "Linux" indicates that the agent browser runs on Linux.

At first glance it looks like ChatGPT is being dishonest here by not including its bot identity in the user-agent header. I thought for a moment it might be reflecting my own user-agent, but I'm using Firefox on macOS and it identified itself as Chrome.

Then I spotted this header:

Signature-Agent: "https://chatgpt.com"

Which is accompanied by a much more complex header called Signature-Input:

Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1754340838;keyid="otMqcjr17mGyruktGvJU8oojQTSMHlVm7uO-lrcqbdg";expires=1754344438;nonce="_8jbGwfLcgt_vUeiZQdWvfyIeh9FmlthEXElL-O2Rq5zydBYWivw4R3sV9PV-zGwZ2OEGr3T2Pmeo2NzmboMeQ";tag="web-bot-auth";alg="ed25519"

And a Signature header too.

These turn out to come from a relatively new web standard: RFC 9421 HTTP Message Signatures' published February 2024.

The purpose of HTTP Message Signatures is to allow clients to include signed data about their request in a way that cannot be tampered with by intermediaries. The signature uses a public key that's provided by the following well-known endpoint:

https://chatgpt.com/.well-known/http-message-signatures-directory

Add it all together and we now have a rock-solid way to identify traffic from ChatGPT agent: look for the Signature-Agent: "https://chatgpt.com" header and confirm its value by checking the signature in the Signature-Input and Signature headers.

And then came Bingbot and Yandex

Just over a minute after it captured that request, my logging endpoint got another request:

Via: 1.1 heroku-router
From: bingbot(at)microsoft.com
Host: simonwillison.net
Accept: */*
Cf-Ray: 96a0f4671d1fc3c6-SEA
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
Cf-Ipcountry: US
X-Request-Id: 6214f5dc-a4ea-5390-1beb-f2d26eac5d01
Accept-Encoding: gzip, br
X-Forwarded-For: 207.46.13.9, 172.71.150.252
X-Request-Start: 1754340916429
Cf-Connecting-Ip: 207.46.13.9
X-Forwarded-Port: 80
X-Forwarded-Proto: http

I pasted 207.46.13.9 into Microsoft's Verify Bingbot tool (after solving a particularly taxing CAPTCHA) and it confirmed that this was indeed a request from Bingbot.

I set up a second URL to confirm... and this time got a visit from Yandex!

Via: 1.1 heroku-router
From: support@search.yandex.ru
Host: simonwillison.net
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Cf-Ray: 96a16390d8f6f3a7-DME
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Cf-Ipcountry: RU
X-Request-Id: 3cdcbdba-f629-0d29-b453-61644da43c6c
Accept-Encoding: gzip, br
X-Forwarded-For: 213.180.203.138, 172.71.184.65
X-Request-Start: 1754345469921
Cf-Connecting-Ip: 213.180.203.138
X-Forwarded-Port: 80
X-Forwarded-Proto: http

Yandex suggest a reverse DNS lookup to verify, so I ran this command:

dig -x 213.180.203.138 +short

And got back:

213-180-203-138.spider.yandex.com.

Which confirms that this is indeed a Yandex crawler.

I tried a third experiment to be sure... and got hits from both Bingbot and YandexBot.

It was Cloudflare Crawler Hints, not ChatGPT

So I wrote up and posted about my discovery... and Jatan Loya asked:

do you have crawler hints enabled in cf?

And yeah, it turned out I did. I spotted this in my caching configuration page (and it looks like I must have turned it on myself at some point in the past):

Here's the Cloudflare documentation for that feature.

I deleted my posts on Twitter and Bluesky (since you can't edit those and I didn't want the misinformation to continue to spread) and edited my post on Mastodon, then updated this entry with the real reason this had happened.

I also changed the URL of this entry as it turned out Twitter and Bluesky were caching my social media preview for the previous one, which included the incorrect information in the title.

Original "So what's going on here?" section from my post

Here's a section of my original post with my theories about what was going on before learning about Cloudflare Crawler Hints.

So what's going on here?

There are quite a few different moving parts here.

I'm using Firefox on macOS with the 1Password and Readwise Highlighter extensions installed and active. Since I didn't visit the debug pages at all with my own browser I don't think any of these are relevant to these results.
ChatGPT agent makes just a single request to my debug URL ...
... which is proxied through both Cloudflare and Heroku.
Within about a minute, I get hits from one or both of Bingbot and Yandex.

Presumably ChatGPT agent itself is running behind at least one proxy - I would expect OpenAI to keep a close eye on that traffic to ensure it doesn't get abused.

I'm guessing that infrastructure is hosted by Microsoft Azure. The OpenAI Sub-processor List - though that lists Microsoft Corporation, CoreWeave Inc, Oracle Cloud Platform and Google Cloud Platform under the "Cloud infrastructure" section so it could be any of those.

Since the page is served over HTTPS my guess is that any intermediary proxies should be unable to see the path component of the URL, making the mystery of how Bingbot and Yandex saw the URL even more intriguing.

Tags: bing, privacy, search-engines, user-agents, ai, cloudflare, generative-ai, chatgpt, llms, browser-agents, retractions

TIL: Rate limiting by IP using Cloudflare's rate limiting rules

2025-07-03T21:16:51+00:00

TIL: Rate limiting by IP using Cloudflare's rate limiting rules

My blog started timing out on some requests a few days ago, and it turned out there were misbehaving crawlers that were spidering my /search/ page even though it's restricted by robots.txt.

I run this site behind Cloudflare and it turns out Cloudflare's WAF (Web Application Firewall) has a rate limiting tool that I could use to restrict requests to /search/* by a specific IP to a maximum of 5 every 10 seconds.

Tags: rate-limiting, security, cloudflare, til

New sandboxes from Cloudflare and Vercel

2025-06-26T01:41:32+00:00

Two interesting new products for running code in a sandbox today.

Cloudflare launched their Containers product in open beta, and added a new Sandbox library for Cloudflare Workers that can run commands in a "secure, container-based environment":

import { getSandbox } from "@cloudflare/sandbox";
const sandbox = getSandbox(env.Sandbox, "my-sandbox");
const output = sandbox.exec("ls", ["-la"]);

Vercel shipped a similar feature, introduced in Run untrusted code with Vercel Sandbox, which enables code that looks like this:

import { Sandbox } from "@vercel/sandbox";

const sandbox = await Sandbox.create();
await sandbox.writeFiles([
    { path: "script.js", stream: Buffer.from(result.text) },
  ]);
await sandbox.runCommand({
    cmd: "node",
    args: ["script.js"],
    stdout: process.stdout,
    stderr: process.stderr,
});

In both cases a major intended use-case is safely executing code that has been created by an LLM.

Tags: vercel, cloudflare, generative-ai, ai, llms, sandboxing

Cloudflare Project Galileo

2025-06-16T19:13:48+00:00

Cloudflare Project Galileo

I only just heard about this Cloudflare initiative, though it's been around for more than a decade:

If you are an organization working in human rights, civil society, journalism, or democracy, you can apply for Project Galileo to get free cyber security protection from Cloudflare.

It's effectively free denial-of-service protection for vulnerable targets in the civil rights public interest groups.

Last week they published Celebrating 11 years of Project Galileo’s global impact with some noteworthy numbers:

Journalists and news organizations experienced the highest volume of attacks, with over 97 billion requests blocked as potential threats across 315 different organizations. [...]

Cloudflare onboarded the Belarusian Investigative Center, an independent journalism organization, on September 27, 2024, while it was already under attack. A major application-layer DDoS attack followed on September 28, generating over 28 billion requests in a single day.

Tags: denial-of-service, journalism, security, cloudflare

Quoting Kenton Varda

2025-06-02T18:52:01+00:00

It took me a few days to build the library [cloudflare/workers-oauth-provider] with AI.

I estimate it would have taken a few weeks, maybe months to write by hand.

That said, this is a pretty ideal use case: implementing a well-known standard on a well-known platform with a clear API spec.

In my attempts to make changes to the Workers Runtime itself using AI, I've generally not felt like it saved much time. Though, people who don't know the codebase as well as I do have reported it helped them a lot.

I have found AI incredibly useful when I jump into other people's complex codebases, that I'm not familiar with. I now feel like I'm comfortable doing that, since AI can help me find my way around very quickly, whereas previously I generally shied away from jumping in and would instead try to get someone on the team to make whatever change I needed.

— Kenton Varda, in a Hacker News comment

Tags: ai, cloudflare, generative-ai, llms, ai-assisted-programming, kenton-varda

llm-prices.com

2025-05-07T20:15:48+00:00

llm-prices.com

I've been maintaining a simple LLM pricing calculator since October last year. I finally decided to split it out to its own domain name (previously it was hosted at tools.simonwillison.net/llm-prices), running on Cloudflare Pages.

The site runs out of my simonw/llm-prices GitHub repository. I ported the history of the old llm-prices.html file using a vibe-coded bash script that I forgot to save anywhere.

I rarely use AI-generated imagery in my own projects, but for this one I found an excellent reason to use GPT-4o image outputs... to generate the favicon! I dropped a screenshot of the site into ChatGPT (o4-mini-high in this case) and asked for the following:

design a bunch of options for favicons for this site in a single image, white background

I liked the top right one, so I cropped it into Pixelmator and made a 32x32 version. Here's what it looks like in my browser:

I added a new feature just now: the state of the calculator is now reflected in the #fragment-hash URL of the page, which means you can link to your previous calculations.

I implemented that feature using the new gemini-2.5-pro-preview-05-06, since that model boasts improved front-end coding abilities. It did a pretty great job - here's how I prompted it:

llm -m gemini-2.5-pro-preview-05-06 -f https://www.llm-prices.com/ -s 'modify this code so that the state of the page is reflected in the fragmenth hash URL - I want to capture the values filling out the form fields and also the current sort order of the table. These should be respected when the page first loads too. Update them using replaceHistory, no need to enable the back button.'

Here's the transcript and the commit updating the tool, plus an example link showing the new feature in action (and calculating the cost for that Gemini 2.5 Pro prompt at 16.8224 cents, after fixing the calculation.)

Tags: favicons, projects, ai, cloudflare, generative-ai, llms, ai-assisted-programming, gemini, llm-pricing, text-to-image, vibe-coding

Note on 18th April 2025

2025-04-18T23:59:01+00:00

It frustrates me when support sites for online services fail to link to the things they are talking about. Cloudflare's Find zone and account IDs page for example provides a four step process for finding my account ID that starts at the root of their dashboard, including a screenshot of where I should click.

In Cloudflare's case it's harder to link to the correct dashboard page because the URL differs for different users, but that shouldn't be a show-stopper for getting this to work. Set up dash.cloudflare.com/redirects/find-account-id and link to that!

... I just noticed they do have a mechanism like that which they use elsewhere. On the R2 authentication page they link to:

https://dash.cloudflare.com/?to=/:account/r2/api-tokens

The "find account ID" flow presumably can't do the same thing because there is no single page displaying that information - it's shown in a sidebar on the page for each of your Cloudflare domains.

Tags: urls, usability, cloudflare

OpenTimes

2025-03-17T22:49:59+00:00

OpenTimes

Spectacular new open geospatial project by Dan Snow:

OpenTimes is a database of pre-computed, point-to-point travel times between United States Census geographies. It lets you download bulk travel time data for free and with no limits.

Here's what I get for travel times by car from El Granada, California:

The technical details are fascinating:

The entire OpenTimes backend is just static Parquet files on Cloudflare's R2. There's no RDBMS or running service, just files and a CDN. The whole thing costs about $10/month to host and costs nothing to serve. In my opinion, this is a great way to serve infrequently updated, large public datasets at low cost (as long as you partition the files correctly).

Sure enough, R2 pricing charges "based on the total volume of data stored" - $0.015 / GB-month for standard storage, then $0.36 / million requests for "Class B" operations which include reads. They charge nothing for outbound bandwidth.

All travel times were calculated by pre-building the inputs (OSM, OSRM networks) and then distributing the compute over hundreds of GitHub Actions jobs. This worked shockingly well for this specific workload (and was also completely free).

Here's a GitHub Actions run of the calculate-times.yaml workflow which uses a matrix to run 255 jobs!

Relevant YAML:

  matrix:
    year: ${{ fromJSON(needs.setup-jobs.outputs.years) }}
    state: ${{ fromJSON(needs.setup-jobs.outputs.states) }}

Where those JSON files were created by the previous step, which reads in the year and state values from this params.yaml file.

The query layer uses a single DuckDB database file with views that point to static Parquet files via HTTP. This lets you query a table with hundreds of billions of records after downloading just the ~5MB pointer file.

This is a really creative use of DuckDB's feature that lets you run queries against large data from a laptop using HTTP range queries to avoid downloading the whole thing.

The README shows how to use that from R and Python - I got this working in the duckdb client (brew install duckdb):

INSTALL httpfs;
LOAD httpfs;
ATTACH 'https://data.opentimes.org/databases/0.0.1.duckdb' AS opentimes;

SELECT origin_id, destination_id, duration_sec
  FROM opentimes.public.times
  WHERE version = '0.0.1'
      AND mode = 'car'
      AND year = '2024'
      AND geography = 'tract'
      AND state = '17'
      AND origin_id LIKE '17031%' limit 10;

In answer to a question about adding public transit times Dan said:

In the next year or so maybe. The biggest obstacles to adding public transit are:

Collecting all the necessary scheduling data (e.g. GTFS feeds) for every transit system in the county. Not insurmountable since there are services that do this currently.

Finding a routing engine that can compute nation-scale travel time matrices quickly. Currently, the two fastest open-source engines I've tried (OSRM and Valhalla) don't support public transit for matrix calculations and the engines that do support public transit (R5, OpenTripPlanner, etc.) are too slow.

GTFS is a popular CSV-based format for sharing transit schedules - here's an official list of available feed directories.

This whole project feels to me like a great example of the baked data architectural pattern in action.

Via Hacker News

Tags: census, geospatial, open-data, openstreetmap, cloudflare, parquet, github-actions, baked-data, duckdb, http-range-requests

OpenAI WebRTC Audio demo

2024-12-17T23:50:12+00:00

OpenAI WebRTC Audio demo

OpenAI announced a bunch of API features today, including a brand new WebRTC API for setting up a two-way audio conversation with their models.

They tweeted this opaque code example:

async function createRealtimeSession(inStream, outEl, token) { const pc = new RTCPeerConnection(); pc.ontrack = e => outEl.srcObject = e.streams[0]; pc.addTrack(inStream.getTracks()[0]); const offer = await pc.createOffer(); await pc.setLocalDescription(offer); const headers = { Authorization: Bearer ${token}, 'Content-Type': 'application/sdp' }; const opts = { method: 'POST', body: offer.sdp, headers }; const resp = await fetch('https://api.openai.com/v1/realtime', opts); await pc.setRemoteDescription({ type: 'answer', sdp: await resp.text() }); return pc; }

So I pasted that into Claude and had it build me this interactive demo for trying out the new API.

My demo uses an OpenAI key directly, but the most interesting aspect of the new WebRTC mechanism is its support for ephemeral tokens.

This solves a major problem with their previous realtime API: in order to connect to their endpoint you need to provide an API key, but that meant making that key visible to anyone who uses your application. The only secure way to handle this was to roll a full server-side proxy for their WebSocket API, just so you could hide your API key in your own server. cloudflare/openai-workers-relay is an example implementation of that pattern.

Ephemeral tokens solve that by letting you make a server-side call to request an ephemeral token which will only allow a connection to be initiated to their WebRTC endpoint for the next 60 seconds. The user's browser then starts the connection, which will last for up to 30 minutes.

Tags: api, audio, security, tools, ai, cloudflare, openai, generative-ai, llms, ai-assisted-programming, claude, multi-modal-output

GitHub OAuth for a static site using Cloudflare Workers

2024-11-29T18:13:18+00:00

GitHub OAuth for a static site using Cloudflare Workers

Here's a TIL covering a Thanksgiving AI-assisted programming project. I wanted to add OAuth against GitHub to some of the projects on my tools.simonwillison.net site in order to implement "Save to Gist".

That site is entirely statically hosted by GitHub Pages, but OAuth has a required server-side component: there's a client_secret involved that should never be included in client-side code.

Since I serve the site from behind Cloudflare I realized that a minimal Cloudflare Workers script may be enough to plug the gap. I got Claude on my phone to build me a prototype and then pasted that (still on my phone) into a new Cloudflare Worker and it worked!

... almost. On later closer inspection of the code it was missing error handling... and then someone pointed out it was vulnerable to a login CSRF attack thanks to failure to check the state= parameter. I worked with Claude to fix those too.

Useful reminder here that pasting code AI-generated code around on a mobile phone isn't necessarily the best environment to encourage a thorough code review!

Tags: csrf, github, oauth, projects, security, tools, ai, cloudflare, generative-ai, llms, ai-assisted-programming

Zero-latency SQLite storage in every Durable Object

2024-10-13T22:26:49+00:00

Zero-latency SQLite storage in every Durable Object

Kenton Varda introduces the next iteration of Cloudflare's Durable Object platform, which recently upgraded from a key/value store to a full relational system based on SQLite.

For useful background on the first version of Durable Objects take a look at Cloudflare's durable multiplayer moat by Paul Butler, who digs into its popularity for building WebSocket-based realtime collaborative applications.

The new SQLite-backed Durable Objects is a fascinating piece of distributed system design, which advocates for a really interesting way to architect a large scale application.

The key idea behind Durable Objects is to colocate application logic with the data it operates on. A Durable Object comprises code that executes on the same physical host as the SQLite database that it uses, resulting in blazingly fast read and write performance.

How could this work at scale?

A single object is inherently limited in throughput since it runs on a single thread of a single machine. To handle more traffic, you create more objects. This is easiest when different objects can handle different logical units of state (like different documents, different users, or different "shards" of a database), where each unit of state has low enough traffic to be handled by a single object

Kenton presents the example of a flight booking system, where each flight can map to a dedicated Durable Object with its own SQLite database - thousands of fresh databases per airline per day.

Each DO has a unique name, and Cloudflare's network then handles routing requests to that object wherever it might live on their global network.

The technical details are fascinating. Inspired by Litestream, each DO constantly streams a sequence of WAL entries to object storage - batched every 16MB or every ten seconds. This also enables point-in-time recovery for up to 30 days through replaying those logged transactions.

To ensure durability within that ten second window, writes are also forwarded to five replicas in separate nearby data centers as soon as they commit, and the write is only acknowledged once three of them have confirmed it.

The JavaScript API design is interesting too: it's blocking rather than async, because the whole point of the design is to provide fast single threaded persistence operations:

let docs = sql.exec(`
  SELECT title, authorId FROM documents
  ORDER BY lastModified DESC
  LIMIT 100
`).toArray();

for (let doc of docs) {
  doc.authorName = sql.exec(
    "SELECT name FROM users WHERE id = ?",
    doc.authorId).one().name;
}

This one of their examples deliberately exhibits the N+1 query pattern, because that's something SQLite is uniquely well suited to handling.

The system underlying Durable Objects is called Storage Relay Service, and it's been powering Cloudflare's existing-but-different D1 SQLite system for over a year.

I was curious as to where the objects are created. According to this (via Hacker News):

Durable Objects do not currently change locations after they are created. By default, a Durable Object is instantiated in a data center close to where the initial get() request is made. [...] To manually create Durable Objects in another location, provide an optional locationHint parameter to get().

And in a footnote:

Dynamic relocation of existing Durable Objects is planned for the future.

where.durableobjects.live is a neat site that tracks where in the Cloudflare network DOs are created - I just visited it and it said:

This page tracks where new Durable Objects are created; for example, when you loaded this page from Half Moon Bay, a worker in San Jose, California, United States (SJC) created a durable object in San Jose, California, United States (SJC).

Via lobste.rs

Tags: scaling, sqlite, websockets, software-architecture, cloudflare, litestream, kenton-varda

Bringing Python to Workers using Pyodide and WebAssembly

2024-04-02T16:09:57+00:00

Bringing Python to Workers using Pyodide and WebAssembly

Cloudflare Workers is Cloudflare’s serverless hosting tool for deploying server-side functions to edge locations in their CDN.

They just released Python support, accompanied by an extremely thorough technical explanation of how they got that to work. The details are fascinating.

Workers runs on V8 isolates, and the new Python support was implemented using Pyodide (CPython compiled to WebAssembly) running inside V8.

Getting this to work performantly and ergonomically took a huge amount of work.

There are too many details in here to effectively summarize, but my favorite detail is this one:

“We scan the Worker’s code for import statements, execute them, and then take a snapshot of the Worker’s WebAssembly linear memory. Effectively, we perform the expensive work of importing packages at deploy time, rather than at runtime.”

Via Hacker News

Tags: python, serverless, cloudflare, webassembly, pyodide

Weeknotes: Page caching and custom templates for Datasette Cloud

2024-01-07T20:45:11+00:00

My main development focus this week has been adding public page caching to Datasette Cloud, and exploring what custom template support might look like for that service.

Datasette Cloud primarily provides private "spaces" for teams to collaborate on data. A team can invite additional members, upload CSV files, use the API to ingest data, run enrichments, share private comments and browse and query the data together.

The overall goal is to help teams find stories in their data.

Originally I planned Datasette Cloud as an exclusively private collaboration space, but with hindsight this was a mistake. Datasette has been a tool for publishing data right from the start, and Datasette Cloud users quickly started asking for ways to share their data with the world.

I started with a plugin for this, datasette-public, allowing tables to be selectively made visible to unauthenticated users.

This raised a couple of challenges though. First, I worry about sudden spikes of traffic. Each Datasette Cloud user gets their own dedicated Fly container to ensure performance issues are isolated and don't affect other users, but I still don't like the idea of a big public traffic spike taking down a user's site.

Secondly, some users expressed interest in customizing the display of their public Datasette instance. The open source Datasette application has extensive support for this, but allowing users to run arbitrary HTML and JavaScript on a hosted service is a major risk for XSS holes.

This week I've been exploring a way to address both of these issues.

Full page caching for unauthorized users

I've used this trick multiple times through my career - at Lanyrd, at Eventbrite and even for my own personal blog. If a user is signed out, serve them pages through a simple full-page cache - something like Varnish. Set a short TTL on that cache - maybe as short as 15s - such that cached content doesn't have time to go stale.

Good caches include support for dog-pile prevention, also known as request coalescing. If 10 requests come in for the same page at exactly the same moment, the cache bundles them together and makes just a single request to the backend, then serves the result to all 10 waiting clients.

How to implement this for Datasette Cloud? My current plan is to use a separate domain - .datasette.site - for the publicly visible pages of each site. So simon.datasette.cloud (my personal Datasette Cloud space) would have simon.datasette.site as its public domain.

I got this working as a proof-of-concept this week. I actually got it working twice: I figured out how to run a dedicated Varnish instance on Fly, and then I realized that Cloudflare also now offer wildcard DNS support so I tried that out too.

I have both mechanisms up and running at the moment, on two separate domains. I'll likely go with the Cloudflare option to reduce the number of moving parts I'm responsible for myself, but having both means I can compare them to see which one is likely to work best.

Custom templates based on host

The other reason I decided to explore *.datasette.site was the security issue I mentioned earlier.

XSS attacks, where malicious JavaScript executes on a trusted domain, are a major security risk.

I plan to explore additional layers of protection against these such as CSP headers, but my general rule is to NEVER allow even a chance of untrusted JavaScript executing on a domain where authenticated users are able to perform privileged actions.

My current plan is to have *.datasette.site work as an entirely cookie-free domain. Any functionality that requires authentication will be handled by the privileged *.datasette.cloud domain instead.

This means I can allow users to provide their own custom templates for their public Datasette instance, without worrying that any mistakes in those templates could lead to a security breach elsewhere within the service.

There was just one catch: this meant I needed Datasette to be able to use different templates depending on host that the content was being served on.

After wasting a bunch of time trying to get this to work through monkey-patching, I realized the solution was to add a new plugin hook. jinja2_environment_from_request(datasette, request, env) is now implemented on main and should be out in a new alpha release pretty soon. The documentation for that hook includes an example that hints at how I'm using it for Datasette Cloud.

Fun further applications of this pattern

I'm wary of adding features to Datasette that only serve Datasette Cloud. In this case, I realized that the new plugin hook opens up some interesting possibilities for other users of Datasette.

I run a bunch of projects on top of Datasette myself - til.simonwillison.net and www.niche-museums.com are two examples of my sites that are actually templated Datasette instances.

Currently, those sites are hosted separately - which means I'm paying to run Datasette multiple times.

With the ability to serve different templates based on host, I've realized I could instead serve a single Datasette instance for multiple sites, each with their own custom templates.

Taking advantage of CNAMEs - or even wildcard DNS - means I could run a whole family of weird personal projects on a single instance without any incremental cost for each new project!

Releases

datasette-upgrade 0.1a0 - 2024-01-06
Upgrade Datasette instance configuration to handle new features

TILs

GitHub Actions, Issues and Pages to build a daily planner - 2024-01-02

Tags: caching, security, varnish, xss, datasette, cloudflare, weeknotes, datasette-cloud

Cloudflare does not consider vary values in caching decisions

2023-11-20T05:08:52+00:00

Cloudflare does not consider vary values in caching decisions

Here’s the spot in Cloudflare’s documentation where they hide a crucially important detail:

“Cloudflare does not consider vary values in caching decisions. Nevertheless, vary values are respected when Vary for images is configured and when the vary header is vary: accept-encoding.”

This means you can’t deploy an application that uses content negotiation via the Accept header behind the Cloudflare CDN—for example serving JSON or HTML for the same URL depending on the incoming Accept header. If you do, Cloudflare may serve cached JSON to an HTML client or vice-versa.

There’s an exception for image files, which Cloudflare added support for in September 2021 (for Pro accounts only) in order to support formats such as WebP which may not have full support across all browsers.

Tags: caching, http, cloudflare

Analytics: Hacker News v.s. a tweet from Elon Musk

2023-02-17T22:11:44+00:00

My post Bing: “I will not harm you unless you harm me first” really took off.

It sat at the top of Hacker News for a full day, and is currently the 18th most popular post of all time on that site.

And then this happened:

Might need a bit more polish …https://t.co/rGYCxoBVeA
- Elon Musk (@elonmusk) February 15, 2023

Given recent changes made to the Twitter algorithm, a lot of people saw that. Twitter currently reports 30.4M views of that tweet.

A bunch of people asked me how much of that converted into page views. So let's dive in!

Headline figures

Here's my Plausible dashboard for that post over the past few days:

Overall numbers: 959k unique visitors, 1.1M page views.

Top sources of traffic:

Twitter: 721k
Direct / None: 132k (this includes traffic from Mastodon)
Hacker News: 49.5k
Facebook: 13.4k
Reddit: 8.3k
Google: 7.8k
tldrnewsletter: 6k
LinkedIn: 5.4k

If we assume the vast majority of the Twitter traffic was from Elon (which seems reasonable) that's 30.4M / 721k = roughly a 2.37% click through rate.

Notable that sticking at the top of Hacker News for a day really does drive an enormous amount of traffic - 18% of the traffic you get from the second most followed account on Twitter (looks like Barack Obama is still number one).

More detailed analytics via Plausible and Cloudflare

I mainly use Plausible for my site's analytics. I really like them: they're privacy-focused, open source (though I use their hosted version) and show me exactly the subset of data I want to see. Most importantly, they don't set cookies.

My site also runs behind Cloudflare, which also provides analytics. I don't pay for the upgraded analytics, but it turns out you can still get some pretty detailed numbers out of them - especially if you're willing to dig around in the browser DevTools.

Plausible offers an "export" button, so I used that... and got a zip file with a bunch of CSVs in it. Here they are in a GitHub repo.

Cloudflare - at least for the free tier - doesn't have a detailed export. But... under the hood the Cloudflare web application uses their GraphQL API to retrieve stats for display, and with a bit of digging you can get numbers out that way.

I extracted this 3.2MB JSON file using the Cloudflare API.

Loading it into Datasette

I wrote this script to load the data I had extracted into SQLite database files, and then deployed them to Vercel using Datasette.

You can explore the result here: https://i-will-not-harm-you-unless-you-harm-me-first.vercel.app/

Here's page views according to Plausible over the time period in question:

It looks to me like the timezone for that data is Pacific Time.

This page shows page views count according to Cloudflare, by hour.

This data is in UTC, where 7pm UTC corresponds to 11am Pacific.

These numbers should differ, because Plausible uses JavaScript to track analytics while Cloudflare is server-side, plus Plausible is filtered to just hits to the specific page while Cloudflare is showing all hits to any page on my site.

There are plenty more ways to slice and dice the data in Datasette:

Unique visitors over time according to Plausible
Uniques over time according to Cloudflare
Full data for those traffic sources from Plausible
Plausible device breakdown - 778,678 mobile, 101,216 desktop, 47,781 laptop (not sure how it distinguishes between desktop and laptop though), 16,967 tablet.
Percentage of cached requests over time according to Cloudflare using a custom SQL query - this was around 40% before the Elon tweet, then jumped up to over 90% and stayed there, thankfully!

I've long been a fan of full-page HTTP caching as protection against surprise traffic events - it's a pattern I've implemented in the past using Varnish and Fastly, and I've been using it on my blog via Cloudflare for several years.

It definitely paid off this time!

Tags: analytics, bing, hacker-news, twitter, datasette, cloudflare

Wildebeest

2023-01-23T00:03:30+00:00

Wildebeest

New project from Cloudflare, first quietly unveiled three weeks ago: “Wildebeest is an ActivityPub and Mastodon-compatible server”. It’s built using a flurry of Cloudflare-specific technology, including Workers, Pages and their SQLite-based D1 database.

Via @simon

Tags: sqlite, cloudflare, mastodon, activitypub

Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare

2022-10-07T17:47:34+00:00

Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare

Alexander Dahl provides a detailed description (and code) for his current preferred free hosting solution for small sites: SQLite (and a Go application) running on Fly’s free tier, with the database replicated up to Cloudflare’s R2 object storage (again on a free tier) by Litestream.

Tags: hosting, sqlite, cloudflare, fly, litestream

1.1.1.1/purge-cache

2021-12-06T23:15:08+00:00

1.1.1.1/purge-cache

Cloudflare’s 1.1.1.1 DNS service has a tool that anyone can use to flush a specific DNS entry from their cache—could be useful for assisting rollouts of new DNS configurations.

Via isclever on Hacker News

Tags: dns, cloudflare

New HTTP standards for caching on the modern web

2021-10-21T22:40:50+00:00

New HTTP standards for caching on the modern web

Cache-Status is a new HTTP header (RFC from August 2021) designed to provide better debugging information about which caches were involved in serving a request—“Cache-Status: Nginx; hit, Cloudflare; fwd=stale; fwd-status=304; collapsed; ttl=300” for example indicates that Nginx served a cache hit, then Cloudflare had a stale cached version so it revalidated from Nginx, got a 304 not modified, collapsed multiple requests (dogpile prevention) and plans to serve the new cached value for the next five minutes. Also described is $Target-Cache-Control: which allows different CDNs to respond to different headers and is already supported by Cloudflare and Akamai (Cloudflare-CDN-Cache-Control: and Akamai-Cache-Control:).

Via Hacker News

Tags: caching, dogpile, http, cloudflare

Details of the Cloudflare outage on July 2, 2019

2019-07-12T17:36:25+00:00

Details of the Cloudflare outage on July 2, 2019

Best retrospective I’ve read in a long time. The outage was caused by a backtracking regex rule that was added to the Web Application Firewall project, which rolls out globally and skips most of Cloudflare’s regular graduar rollout process (delightfully animal themed, named DOG for the dogfooding PoP that their employees use, PIG for the Guinea Pig PoPs reserved for free customers, then Canary for the final step) so that they can deploy counter-measures to newly discovered vulnerabilities as quickly as possible—but the real value in the retro is that it provides an extremely deep insight into how Cloudflare organize, test and manage their changes. Really interesting stuff.

Via Hacker News

Tags: operations, regular-expressions, cloudflare, postmortem

The Now CDN

2018-07-12T03:34:06+00:00

The Now CDN

Huge announcement from Zeit Now today: all .now.sh deployments are now served through the Cloudflare CDN, which means they benefit from 150 worldwide CDN locations that obey HTTP caching headers. This is particularly relevant for Datasette, since it serves far-future cache headers by default and uses Cloudflare-compatible HTTP/2 push hints to accelerate 302 redirects. This means that both the “datasette publish now” CLI command and the Datasette Publish web app will now result in Cloudflare-accelerated deployments.

Via @zeithq

Tags: cdn, performance, zeit-now, datasette, cloudflare

Everyone can now run JavaScript on Cloudflare with Workers

2018-03-13T16:36:53+00:00

Everyone can now run JavaScript on Cloudflare with Workers

This is such a brilliant piece of software design: Cloudflare took the service workers spec and used it as the basis for their edge-executed JacaScript feature. This means you can run server-side JavaScript in hundreds of edge locations worldwide, applying custom dynamic logic (including additional async cached fetch() calls) with only around 1ms if additional overhead. The pricing model is a steal: $0.50 per million requests with a $5/month minimum.

Tags: cdn, javascript, cloudflare, serviceworkers

Simon Willison's Weblog: cloudflare

DNS Lookup

tldraw issue: Move tests to closed source repo

Adding dynamic features to an aggressively cached website

Edit links that are visible only to me

Random navigation within a tag

And the prompts

Using Claude in Chrome to navigate out the Cloudflare dashboard

Quoting Matthew Prince

Video: Building a tool to copy-paste share terminal sessions using Claude Code for web

The initial problem

The desired solution

The prompt

Combining previous tools

The result

Other notes from the video

Quoting Kenton Varda

Cloudflare Radar: AI Insights

ChatGPT agent's user-agent

Investigating ChatGPT agent's user-agent

And then came Bingbot and Yandex

It was Cloudflare Crawler Hints, not ChatGPT

So what's going on here?

TIL: Rate limiting by IP using Cloudflare's rate limiting rules

New sandboxes from Cloudflare and Vercel

Cloudflare Project Galileo

Quoting Kenton Varda

llm-prices.com

Note on 18th April 2025

OpenTimes

OpenAI WebRTC Audio demo

GitHub OAuth for a static site using Cloudflare Workers

Zero-latency SQLite storage in every Durable Object

Bringing Python to Workers using Pyodide and WebAssembly

Weeknotes: Page caching and custom templates for Datasette Cloud

Full page caching for unauthorized users

Custom templates based on host

Fun further applications of this pattern

Releases

TILs

Cloudflare does not consider vary values in caching decisions

Analytics: Hacker News v.s. a tweet from Elon Musk

Headline figures

More detailed analytics via Plausible and Cloudflare

Loading it into Datasette

Wildebeest

Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare

1.1.1.1/purge-cache

New HTTP standards for caching on the modern web

Details of the Cloudflare outage on July 2, 2019

The Now CDN

Everyone can now run JavaScript on Cloudflare with Workers