<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: fly</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/fly.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-05-21T18:21:07+00:00</updated><author><name>Simon Willison</name></author><entry><title>datasette-agent-sprites 0.1a0</title><link href="https://simonwillison.net/2026/May/21/datasette-agent-sprites/#atom-tag" rel="alternate"/><published>2026-05-21T18:21:07+00:00</published><updated>2026-05-21T18:21:07+00:00</updated><id>https://simonwillison.net/2026/May/21/datasette-agent-sprites/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/datasette/datasette-agent-sprites/releases/tag/0.1a0"&gt;datasette-agent-sprites 0.1a0&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;A Datasette Agent plugin for running commands in a &lt;a href="https://sprites.dev"&gt;Fly Sprites&lt;/a&gt; sandbox.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sandboxing"&gt;sandboxing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-agent"&gt;datasette-agent&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="sandboxing"/><category term="datasette"/><category term="fly"/><category term="datasette-agent"/></entry><entry><title>An Update on Heroku</title><link href="https://simonwillison.net/2026/Feb/6/an-update-on-heroku/#atom-tag" rel="alternate"/><published>2026-02-06T18:44:21+00:00</published><updated>2026-02-06T18:44:21+00:00</updated><id>https://simonwillison.net/2026/Feb/6/an-update-on-heroku/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.heroku.com/blog/an-update-on-heroku/"&gt;An Update on Heroku&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
An ominous headline to see on the official Heroku blog and yes, it's bad news.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Today, Heroku is transitioning to a sustaining engineering model focused on stability, security, reliability, and support. Heroku remains an actively supported, production-ready platform, with an emphasis on maintaining quality and operational excellence rather than introducing new features. We know changes like this can raise questions, and we want to be clear about what this means for customers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Based on context I'm guessing a "sustaining engineering model" (this definitely isn't a widely used industry term) means that they'll keep the lights on and that's it.&lt;/p&gt;
&lt;p&gt;This is a very frustrating piece of corporate communication. "We want to be clear about what this means for customers" - then proceeds to &lt;em&gt;not be clear&lt;/em&gt; about what this means for customers.&lt;/p&gt;
&lt;p&gt;Why are they doing this? Here's their explanation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We’re focusing our product and engineering investments on areas where we can deliver the greatest long-term customer value, including helping organizations build and deploy enterprise-grade AI in a secure and trusted way.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My blog is the only project I have left running on Heroku. I guess I'd better migrate it away (probably to Fly) before Salesforce lose interest completely.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/salesforce"&gt;salesforce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/heroku"&gt;heroku&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;



</summary><category term="salesforce"/><category term="heroku"/><category term="fly"/></entry><entry><title>Introducing Deno Sandbox</title><link href="https://simonwillison.net/2026/Feb/3/introducing-deno-sandbox/#atom-tag" rel="alternate"/><published>2026-02-03T22:44:50+00:00</published><updated>2026-02-03T22:44:50+00:00</updated><id>https://simonwillison.net/2026/Feb/3/introducing-deno-sandbox/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://deno.com/blog/introducing-deno-sandbox"&gt;Introducing Deno Sandbox&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Here's a new hosted sandbox product from the Deno team. It's actually unrelated to Deno itself - this is part of their Deno Deploy SaaS platform. As such, you don't even need to use JavaScript to access it - you can create and execute code in a hosted sandbox using their &lt;a href="https://pypi.org/project/deno-sandbox/"&gt;deno-sandbox&lt;/a&gt; Python library like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;export&lt;/span&gt; DENO_DEPLOY_TOKEN=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;... API token ...&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
uv run --with deno-sandbox python&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;deno_sandbox&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;DenoDeploy&lt;/span&gt;

&lt;span class="pl-s1"&gt;sdk&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;DenoDeploy&lt;/span&gt;()

&lt;span class="pl-k"&gt;with&lt;/span&gt; &lt;span class="pl-s1"&gt;sdk&lt;/span&gt;.&lt;span class="pl-c1"&gt;sandbox&lt;/span&gt;.&lt;span class="pl-c1"&gt;create&lt;/span&gt;() &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-s1"&gt;sb&lt;/span&gt;:
    &lt;span class="pl-c"&gt;# Run a shell command&lt;/span&gt;
    &lt;span class="pl-s1"&gt;process&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;sb&lt;/span&gt;.&lt;span class="pl-c1"&gt;spawn&lt;/span&gt;(
        &lt;span class="pl-s"&gt;"echo"&lt;/span&gt;, &lt;span class="pl-s1"&gt;args&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;[&lt;span class="pl-s"&gt;"Hello from the sandbox!"&lt;/span&gt;]
    )
    &lt;span class="pl-s1"&gt;process&lt;/span&gt;.&lt;span class="pl-c1"&gt;wait&lt;/span&gt;()
    &lt;span class="pl-c"&gt;# Write and read files&lt;/span&gt;
    &lt;span class="pl-s1"&gt;sb&lt;/span&gt;.&lt;span class="pl-c1"&gt;fs&lt;/span&gt;.&lt;span class="pl-c1"&gt;write_text_file&lt;/span&gt;(
        &lt;span class="pl-s"&gt;"/tmp/example.txt"&lt;/span&gt;, &lt;span class="pl-s"&gt;"Hello, World!"&lt;/span&gt;
    )
    &lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s1"&gt;sb&lt;/span&gt;.&lt;span class="pl-c1"&gt;fs&lt;/span&gt;.&lt;span class="pl-c1"&gt;read_text_file&lt;/span&gt;(
        &lt;span class="pl-s"&gt;"/tmp/example.txt"&lt;/span&gt;
    ))&lt;/pre&gt;
&lt;p&gt;There’s a JavaScript client library as well. The underlying API isn’t documented yet but appears &lt;a href="https://tools.simonwillison.net/zip-wheel-explorer?package=deno-sandbox#deno_sandbox/sandbox.py--L187"&gt;to use WebSockets&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There’s a lot to like about this system. Sandboxe instances can have up to 4GB of RAM, get 2 vCPUs, 10GB of ephemeral storage, can mount persistent volumes and can use snapshots to boot pre-configured custom images quickly. Sessions can last up to 30 minutes and are billed by CPU time, GB-h of memory and volume storage usage.&lt;/p&gt;
&lt;p&gt;When you create a sandbox you can configure network domains it’s allowed to access.&lt;/p&gt;
&lt;p&gt;My favorite feature is the way it handles API secrets.&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;with&lt;/span&gt; &lt;span class="pl-s1"&gt;sdk&lt;/span&gt;.&lt;span class="pl-c1"&gt;sandboxes&lt;/span&gt;.&lt;span class="pl-c1"&gt;create&lt;/span&gt;(
    &lt;span class="pl-s1"&gt;allowNet&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;[&lt;span class="pl-s"&gt;"api.openai.com"&lt;/span&gt;],
    &lt;span class="pl-s1"&gt;secrets&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;{
        &lt;span class="pl-s"&gt;"OPENAI_API_KEY"&lt;/span&gt;: {
            &lt;span class="pl-s"&gt;"hosts"&lt;/span&gt;: [&lt;span class="pl-s"&gt;"api.openai.com"&lt;/span&gt;],
            &lt;span class="pl-s"&gt;"value"&lt;/span&gt;: &lt;span class="pl-s1"&gt;os&lt;/span&gt;.&lt;span class="pl-c1"&gt;environ&lt;/span&gt;.&lt;span class="pl-c1"&gt;get&lt;/span&gt;(&lt;span class="pl-s"&gt;"OPENAI_API_KEY"&lt;/span&gt;),
        }
    },
) &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-s1"&gt;sandbox&lt;/span&gt;:
    &lt;span class="pl-c"&gt;# ... $OPENAI_API_KEY is available&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;Within the container that &lt;code&gt;$OPENAI_API_KEY&lt;/code&gt; value is set to something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;DENO_SECRET_PLACEHOLDER_b14043a2f578cba...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Outbound API calls to &lt;code&gt;api.openai.com&lt;/code&gt; run through a proxy which is aware of those placeholders and replaces them with the original secret.&lt;/p&gt;
&lt;p&gt;In this way the secret itself is not available to code within the sandbox, which limits the ability for malicious code (e.g. from a prompt injection) to exfiltrate those secrets.&lt;/p&gt;
&lt;p&gt;From &lt;a href="https://news.ycombinator.com/item?id=46874097#46874959"&gt;a comment on Hacker News&lt;/a&gt; I learned that Fly have a project called &lt;a href="https://github.com/superfly/tokenizer"&gt;tokenizer&lt;/a&gt; that implements the same pattern. Adding this to my list of tricks to use with sandoxed environments!

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46874097"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sandboxing"&gt;sandboxing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/deno"&gt;deno&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;



</summary><category term="python"/><category term="sandboxing"/><category term="security"/><category term="deno"/><category term="fly"/></entry><entry><title>The Design &amp; Implementation of Sprites</title><link href="https://simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/#atom-tag" rel="alternate"/><published>2026-01-15T16:08:27+00:00</published><updated>2026-01-15T16:08:27+00:00</updated><id>https://simonwillison.net/2026/Jan/15/the-design-implementation-of-sprites/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/design-and-implementation/"&gt;The Design &amp;amp; Implementation of Sprites&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I &lt;a href="https://simonwillison.net/2026/Jan/9/sprites-dev/"&gt;wrote about Sprites last week&lt;/a&gt;. Here's Thomas Ptacek from Fly with the insider details on how they work under the hood.&lt;/p&gt;
&lt;p&gt;I like this framing of them as "disposable computers":&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Sprites are ball-point disposable computers. Whatever mark you mean to make, we’ve rigged it so you’re never more than a second or two away from having a Sprite to do it with.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've noticed that new Fly Machines can take a while (up to around a minute) to provision. Sprites solve that by keeping warm pools of unused machines in multiple regions, which is enabled by them all using the same container:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Now, today, under the hood, Sprites are still Fly Machines. But they all run from a standard container. Every physical worker knows exactly what container the next Sprite is going to start with, so it’s easy for us to keep pools of “empty” Sprites standing by. The result: a Sprite create doesn’t have any heavy lifting to do; it’s basically just doing the stuff we do when we start a Fly Machine.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The most interesting detail is how the persistence layer works. Sprites only charge you for data you have written that differs from the base image and provide ~300ms checkpointing and restores - it turns out that's power by a custom filesystem on top of S3-compatible storage coordinated by Litestream-replicated local SQLite metadata:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We still exploit NVMe, but not as the root of storage. Instead, it’s a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words “Sprites are backed by object storage.” [...]&lt;/p&gt;
&lt;p&gt;The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (“chunks”) and metadata (a map of where the “chunks” are). Data chunks live on object stores; metadata lives in fast local storage. In our case, that metadata store is &lt;a href="https://litestream.io"&gt;kept durable with Litestream&lt;/a&gt;. Nothing depends on local storage.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/tqbf/status/2011823480673624434"&gt;@tqbf&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sandboxing"&gt;sandboxing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/software-architecture"&gt;software-architecture&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;&lt;/p&gt;



</summary><category term="sandboxing"/><category term="sqlite"/><category term="thomas-ptacek"/><category term="software-architecture"/><category term="fly"/><category term="litestream"/></entry><entry><title>Fly's new Sprites.dev addresses both developer sandboxes and API sandboxes at the same time</title><link href="https://simonwillison.net/2026/Jan/9/sprites-dev/#atom-tag" rel="alternate"/><published>2026-01-09T23:57:12+00:00</published><updated>2026-01-09T23:57:12+00:00</updated><id>https://simonwillison.net/2026/Jan/9/sprites-dev/#atom-tag</id><summary type="html">
    &lt;p&gt;New from Fly.io today: &lt;a href="https://sprites.dev"&gt;Sprites.dev&lt;/a&gt;. Here's their &lt;a href="https://fly.io/blog/code-and-let-live/"&gt;blog post&lt;/a&gt; and &lt;a href="https://www.youtube.com/watch?v=7BfTLlwO4hw"&gt;YouTube demo&lt;/a&gt;. It's an interesting new product that's quite difficult to explain - Fly call it "Stateful sandbox environments with checkpoint &amp;amp; restore" but I see it as hitting two of my current favorite problems: a safe development environment for running coding agents &lt;em&gt;and&lt;/em&gt; an API for running untrusted code in a secure sandbox.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Disclosure: Fly sponsor some of my work. They did not ask me to write about Sprites and I didn't get preview access prior to the launch. My enthusiasm here is genuine.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/Jan/9/sprites-dev/#developer-sandboxes"&gt;Developer sandboxes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/Jan/9/sprites-dev/#storage-and-checkpoints"&gt;Storage and checkpoints&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/Jan/9/sprites-dev/#really-clever-use-of-claude-skills"&gt;Really clever use of Claude Skills&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/Jan/9/sprites-dev/#a-sandbox-api"&gt;A sandbox API&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/Jan/9/sprites-dev/#scale-to-zero-billing"&gt;Scale-to-zero billing&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/Jan/9/sprites-dev/#two-of-my-favorite-problems-at-once"&gt;Two of my favorite problems at once&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="developer-sandboxes"&gt;Developer sandboxes&lt;/h4&gt;
&lt;p&gt;I predicted earlier this week that &lt;a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#1-year-a-challenger-disaster-for-coding-agent-security"&gt;"we’re due a Challenger disaster with respect to coding agent security"&lt;/a&gt; due to the terrifying way most of us are using coding agents like Claude Code and Codex CLI. Running them in &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; mode (aka YOLO mode, where the agent acts without constantly seeking approval first) unlocks so much more power, but also means that a mistake or a malicious prompt injection can cause all sorts of damage to your system and data.&lt;/p&gt;
&lt;p&gt;The safe way to run YOLO mode is in a robust sandbox, where the worst thing that can happen is the sandbox gets messed up and you have to throw it away and get another one.&lt;/p&gt;
&lt;p&gt;That's the first problem Sprites solves:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;curl https://sprites.dev/install.sh &lt;span class="pl-k"&gt;|&lt;/span&gt; bash

sprite login
sprite create my-dev-environment
sprite console -s my-dev-environment&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That's all it takes to get SSH connected to a fresh environment, running in an ~8GB RAM, 8 CPU server. And... Claude Code and Codex and Gemini CLI and Python 3.13 and Node.js 22.20 and a bunch of other tools are already installed.&lt;/p&gt;
&lt;p&gt;The first time you run &lt;code&gt;claude&lt;/code&gt; it neatly signs you in to your existing account with Anthropic. The Sprites VM is persistent so future runs of &lt;code&gt;sprite console -s&lt;/code&gt; will get you back to where you were before.&lt;/p&gt;
&lt;p&gt;... and it automatically sets up port forwarding, so you can run a localhost server on your Sprite and access it from &lt;code&gt;localhost:8080&lt;/code&gt; on your machine.&lt;/p&gt;
&lt;p&gt;There's also a command you can run to assign a public URL to your Sprite, so anyone else can access it if they know the secret URL.&lt;/p&gt;
&lt;h4 id="storage-and-checkpoints"&gt;Storage and checkpoints&lt;/h4&gt;
&lt;p&gt;In &lt;a href="https://fly.io/blog/code-and-let-live/"&gt;the blog post&lt;/a&gt; Kurt Mackey argues that ephemeral, disposable sandboxes are not the best fit for coding agents:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The state of the art in agent isolation is a read-only sandbox. At Fly.io, we’ve been selling that story for years, and we’re calling it: ephemeral sandboxes are obsolete. Stop killing your sandboxes every time you use them. [...]&lt;/p&gt;
&lt;p&gt;If you force an agent to, it’ll work around containerization and do work . But you’re not helping the agent in any way by doing that. They don’t want containers. They don’t want “sandboxes”. They want computers.&lt;/p&gt;
&lt;p&gt;[...] with an actual computer, Claude doesn’t have to rebuild my entire development environment every time I pick up a PR.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Each Sprite gets a proper filesystem which persists in between sessions, even while the Sprite itself shuts down after inactivity. It sounds like they're doing some clever filesystem tricks here, I'm looking forward to learning more about those in the future.&lt;/p&gt;
&lt;p&gt;There are some clues on &lt;a href="https://sprites.dev/"&gt;the homepage&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You read and write to fast, directly attached NVMe storage. Your data then gets written to durable, external object storage. [...]&lt;/p&gt;
&lt;p&gt;You don't pay for allocated filesystem space, just the blocks you write. And it's all TRIM friendly, so your bill goes down when you delete things.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The really clever feature is checkpoints. You (or your coding agent) can trigger a checkpoint which takes around 300ms. This captures the entire disk state and can then be rolled back to later.&lt;/p&gt;
&lt;p&gt;For more on how that works, run this in a Sprite:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cat /.sprite/docs/agent-context.md
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's the relevant section:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Checkpoints
- Point-in-time checkpoints and restores available
- Copy-on-write implementation for storage efficiency
- Last 5 checkpoints mounted at `/.sprite/checkpoints`
- Checkpoints capture only the writable overlay, not the base image
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or run this to see the &lt;code&gt;--help&lt;/code&gt; for the command used to manage them:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;sprite-env checkpoints --help&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Which looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sprite-env checkpoints - Manage environment checkpoints

USAGE:
    sprite-env checkpoints &amp;lt;subcommand&amp;gt; [options]

SUBCOMMANDS:
    list [--history &amp;lt;ver&amp;gt;]  List all checkpoints (optionally filter by history version)
    get &amp;lt;id&amp;gt;                Get checkpoint details (e.g., v0, v1, v2)
    create                  Create a new checkpoint (auto-versioned)
    restore &amp;lt;id&amp;gt;            Restore from a checkpoint (e.g., v1)

NOTE:
    Checkpoints are versioned as v0, v1, v2, etc.
    Restore returns immediately and triggers an async restore that restarts the environment.
    The last 5 checkpoints are mounted at /.sprite/checkpoints for direct file access.

EXAMPLES:
    sprite-env checkpoints list
    sprite-env checkpoints list --history v1.2.3
    sprite-env checkpoints get v2
    sprite-env checkpoints create
    sprite-env checkpoints restore v1
&lt;/code&gt;&lt;/pre&gt;
&lt;h4 id="really-clever-use-of-claude-skills"&gt;Really clever use of Claude Skills&lt;/h4&gt;
&lt;p&gt;I'm &lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/"&gt;a big fan of Skills&lt;/a&gt;, the mechanism whereby Claude Code (and increasingly other agents too) can be given additional capabilities by describing them in Markdown files in a specific directory structure.&lt;/p&gt;
&lt;p&gt;In a smart piece of design, Sprites uses pre-installed skills to teach Claude how Sprites itself works. This means you can ask Claude on the machine how to do things like open up ports and it will talk you through the process.&lt;/p&gt;
&lt;p&gt;There's all sorts of interesting stuff in the &lt;code&gt;/.sprite&lt;/code&gt; folder on that machine - digging in there is a great way to learn more about how Sprites works.&lt;/p&gt;
&lt;h4 id="a-sandbox-api"&gt;A sandbox API&lt;/h4&gt;
&lt;p&gt;Also from my predictions post earlier this week: &lt;a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#1-year-we-re-finally-going-to-solve-sandboxing"&gt;"We’re finally going to solve sandboxing"&lt;/a&gt;. I am obsessed with this problem: I want to be able to run untrusted code safely, both on my personal devices and in the context of web services I'm building for other people to use.&lt;/p&gt;
&lt;p&gt;I have &lt;em&gt;so many things&lt;/em&gt; I want to build that depend on being able to take untrusted code - from users or from LLMs or from LLMs-driven-by-users - and run that code in a sandbox where I can be confident that the blast radius if something goes wrong is tightly contained.&lt;/p&gt;
&lt;p&gt;Sprites offers a clean &lt;a href="https://sprites.dev/api"&gt;JSON API&lt;/a&gt; for doing exactly that, plus client libraries in &lt;a href="https://github.com/superfly/sprites-go"&gt;Go&lt;/a&gt; and &lt;a href="https://github.com/superfly/sprites-js"&gt;TypeScript&lt;/a&gt; and coming-soon &lt;a href="https://github.com/superfly/sprites-py"&gt;Python&lt;/a&gt; and &lt;a href="https://github.com/superfly/sprites-ex"&gt;Elixir&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;From their quick start:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Create a new sprite
curl -X PUT https://api.sprites.dev/v1/sprites/my-sprite \
-H "Authorization: Bearer $SPRITES_TOKEN"

# Execute a command
curl -X POST https://api.sprites.dev/v1/sprites/my-sprite/exec \
-H "Authorization: Bearer $SPRITES_TOKEN" \
-d '{"command": "echo hello"}'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can also checkpoint and rollback via the API, so you can get your environment exactly how you like it, checkpoint it, run a bunch of untrusted code, then roll back to the clean checkpoint when you're done.&lt;/p&gt;
&lt;p&gt;Managing network access is an important part of maintaining a good sandbox. The Sprites API lets you &lt;a href="https://sprites.dev/api/sprites/policies"&gt;configure network access policies&lt;/a&gt; using a DNS-based allow/deny list like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;curl -X POST \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;https://api.sprites.dev/v1/sprites/{name}/policy/network&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  -H &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Authorization: Bearer &lt;span class="pl-smi"&gt;$SPRITES_TOKEN&lt;/span&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  -H &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Content-Type: application/json&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  -d &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;{&lt;/span&gt;
&lt;span class="pl-s"&gt;    "rules": [&lt;/span&gt;
&lt;span class="pl-s"&gt;      {&lt;/span&gt;
&lt;span class="pl-s"&gt;        "action": "allow",&lt;/span&gt;
&lt;span class="pl-s"&gt;        "domain": "github.com"&lt;/span&gt;
&lt;span class="pl-s"&gt;      },&lt;/span&gt;
&lt;span class="pl-s"&gt;      {&lt;/span&gt;
&lt;span class="pl-s"&gt;        "action": "allow",&lt;/span&gt;
&lt;span class="pl-s"&gt;        "domain": "*.npmjs.org"&lt;/span&gt;
&lt;span class="pl-s"&gt;      }&lt;/span&gt;
&lt;span class="pl-s"&gt;    ]&lt;/span&gt;
&lt;span class="pl-s"&gt;  }&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="scale-to-zero-billing"&gt;Scale-to-zero billing&lt;/h4&gt;
&lt;p&gt;Sprites have scale-to-zero baked into the architecture. They go to sleep after 30 seconds of inactivity, wake up quickly when needed and bill you for just the CPU hours, RAM hours and GB-hours of storage you use while the Sprite is awake.&lt;/p&gt;
&lt;p&gt;Fly &lt;a href="https://sprites.dev/#billing"&gt;estimate&lt;/a&gt; a 4 hour intensive coding session as costing around 46 cents, and a low traffic web app with 30 hours of wake time per month at ~$4.&lt;/p&gt;
&lt;p&gt;(I calculate that a web app that consumes all 8 CPUs and all 8GBs of RAM 24/7 for a month would cost ((7 cents * 8 * 24 * 30) + (4.375 cents * 8 * 24 * 30)) / 100  = $655.2 per month, so don't necessarily use these as your primary web hosting solution for an app that soaks up all available CPU and RAM!)&lt;/p&gt;
&lt;h4 id="two-of-my-favorite-problems-at-once"&gt;Two of my favorite problems at once&lt;/h4&gt;
&lt;p&gt;I was hopeful that Fly would enter the developer-friendly sandbox API market, especially given other entrants from companies like &lt;a href="https://sandbox.cloudflare.com/"&gt;Cloudflare&lt;/a&gt; and &lt;a href="https://modal.com/docs/guide/sandboxes"&gt;Modal&lt;/a&gt; and &lt;a href="https://e2b.dev/"&gt;E2B&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I did not expect that they'd tackle the developer sandbox problem at the same time, and with the same product!&lt;/p&gt;
&lt;p&gt;My one concern here is that it makes the product itself a little harder to explain.&lt;/p&gt;
&lt;p&gt;I'm already spinning up some prototypes of sandbox-adjacent things I've always wanted to build, and early signs are very promising. I'll write more about these as they turn into useful projects.&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Here's some &lt;a href="https://news.ycombinator.com/item?id=46557825#46560748"&gt;additional colour&lt;/a&gt; from Thomas Ptacek on Hacker News:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This has been in the works for quite awhile here. We put a long bet on "slow create fast start/stop" --- which is a really interesting and useful shape for execution environments --- but it didn't make sense to sandboxers, so "fast create" has been the White Whale at Fly.io for over a year.&lt;/p&gt;
&lt;/blockquote&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sandboxing"&gt;sandboxing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="sandboxing"/><category term="thomas-ptacek"/><category term="ai"/><category term="fly"/><category term="coding-agents"/></entry><entry><title>How I automate my Substack newsletter with content from my blog</title><link href="https://simonwillison.net/2025/Nov/19/how-i-automate-my-substack-newsletter/#atom-tag" rel="alternate"/><published>2025-11-19T22:00:34+00:00</published><updated>2025-11-19T22:00:34+00:00</updated><id>https://simonwillison.net/2025/Nov/19/how-i-automate-my-substack-newsletter/#atom-tag</id><summary type="html">
    &lt;p&gt;I sent out &lt;a href="https://simonw.substack.com/p/trying-out-gemini-3-pro-with-audio"&gt;my weekly-ish Substack newsletter&lt;/a&gt; this morning and took the opportunity to record &lt;a href="https://www.youtube.com/watch?v=BoPZltKDM-s"&gt;a YouTube video&lt;/a&gt; demonstrating my process and describing the different components that make it work. There's a &lt;em&gt;lot&lt;/em&gt; of digital duct tape involved, taking the content from Django+Heroku+PostgreSQL to GitHub Actions to SQLite+Datasette+Fly.io to JavaScript+Observable and finally to Substack.&lt;/p&gt;

&lt;p&gt;&lt;lite-youtube videoid="BoPZltKDM-s" js-api="js-api"
  title="How I automate my Substack newsletter with content from my blog"
  playlabel="Play: How I automate my Substack newsletter with content from my blog"
&gt; &lt;/lite-youtube&gt;&lt;/p&gt;

&lt;p&gt;The core process is the same as I described &lt;a href="https://simonwillison.net/2023/Apr/4/substack-observable/"&gt;back in 2023&lt;/a&gt;. I have an Observable notebook called &lt;a href="https://observablehq.com/@simonw/blog-to-newsletter"&gt;blog-to-newsletter&lt;/a&gt; which fetches content from my blog's database, filters out anything that has been in the newsletter before, formats what's left as HTML and offers a big "Copy rich text newsletter to clipboard" button.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/copy-to-newsletter.jpg" alt="Screenshot of the interface. An item in a list says 9080: Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark. A huge button reads Copy rich text newsletter to clipboard - below is a smaller button that says Copy just the links/quotes/TILs. A Last X days slider is set to 2. There are checkboxes for SKip content sent in prior newsletters and only include post content prior to the cutoff comment." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I click that button, paste the result into the Substack editor, tweak a few things and hit send. The whole process usually takes just a few minutes.&lt;/p&gt;
&lt;p&gt;I make very minor edits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I set the title and the subheading for the newsletter. This is often a direct copy of the title of the featured blog post.&lt;/li&gt;
&lt;li&gt;Substack turns YouTube URLs into embeds, which often isn't what I want - especially if I have a YouTube URL inside a code example.&lt;/li&gt;
&lt;li&gt;Blocks of preformatted text often have an extra blank line at the end, which I remove.&lt;/li&gt;
&lt;li&gt;Occasionally I'll make a content edit - removing a piece of content that doesn't fit the newsletter, or fixing a time reference like "yesterday" that doesn't make sense any more.&lt;/li&gt;
&lt;li&gt;I pick the featured image for the newsletter and add some tags.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That's the whole process!&lt;/p&gt;
&lt;h4 id="the-observable-notebook"&gt;The Observable notebook&lt;/h4&gt;
&lt;p&gt;The most important cell in the Observable notebook is this one:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;raw_content&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-c1"&gt;return&lt;/span&gt; &lt;span class="pl-s1"&gt;await&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;
    &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-en"&gt;fetch&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
      &lt;span class="pl-s"&gt;`https://datasette.simonwillison.net/simonwillisonblog.json?sql=&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-en"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s1"&gt;        &lt;span class="pl-s1"&gt;sql&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s1"&gt;      &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;&amp;amp;_shape=array&amp;amp;numdays=&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;numDays&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;`&lt;/span&gt;
    &lt;span class="pl-kos"&gt;)&lt;/span&gt;
  &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;json&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This uses the JavaScript &lt;code&gt;fetch()&lt;/code&gt; function to pull data from my blog's Datasette instance, using a very complex SQL query that is composed elsewhere in the notebook.&lt;/p&gt;
&lt;p&gt;Here's a link to &lt;a href="https://datasette.simonwillison.net/simonwillisonblog?sql=with+content+as+%28%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27entry%27+as+type%2C%0D%0A++++title%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27%3Ch3%3E%3Ca+href%3D%22%27+%7C%7C+%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%0D%0A++++++%7C%7C+%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27+%7C%7C+%27%22%3E%27+%0D%0A++++++%7C%7C+title+%7C%7C+%27%3C%2Fa%3E+-+%27+%7C%7C+date%28created%29+%7C%7C+%27%3C%2Fh3%3E%27+%7C%7C+body%0D%0A++++++as+html%2C%0D%0A++++%27null%27+as+json%2C%0D%0A++++%27%27+as+external_url%0D%0A++from+blog_entry%0D%0A++union+all%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27blogmark%27+as+type%2C%0D%0A++++link_title%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27%3Cp%3E%3Cstrong%3ELink%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C+%27+%3Ca+href%3D%22%27%7C%7C+link_url+%7C%7C+%27%22%3E%27%0D%0A++++++%7C%7C+link_title+%7C%7C+%27%3C%2Fa%3E%3A%3C%2Fp%3E%3Cp%3E%27+%7C%7C+%27+%27+%7C%7C+replace%28commentary%2C+%27%0D%0A%27%2C+%27%3Cbr%3E%27%29+%7C%7C+%27%3C%2Fp%3E%27%0D%0A++++++as+html%2C%0D%0A++++json_object%28%0D%0A++++++%27created%27%2C+date%28created%29%2C%0D%0A++++++%27link_url%27%2C+link_url%2C%0D%0A++++++%27link_title%27%2C+link_title%2C%0D%0A++++++%27commentary%27%2C+commentary%2C%0D%0A++++++%27use_markdown%27%2C+use_markdown%0D%0A++++%29+as+json%2C%0D%0A++link_url+as+external_url%0D%0A++from+blog_blogmark%0D%0A++union+all%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27quotation%27+as+type%2C%0D%0A++++source%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27%3Cstrong%3Equote%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C%0D%0A++++%27%3Cblockquote%3E%3Cp%3E%3Cem%3E%27+%7C%7C%0D%0A++++replace%28quotation%2C+%27%0D%0A%27%2C+%27%3Cbr%3E%27%29+%7C%7C+%0D%0A++++%27%3C%2Fem%3E%3C%2Fp%3E%3C%2Fblockquote%3E%3Cp%3E%3Ca+href%3D%22%27+%7C%7C%0D%0A++++coalesce%28source_url%2C+%27%23%27%29+%7C%7C+%27%22%3E%27+%7C%7C+source+%7C%7C+%27%3C%2Fa%3E%27+%7C%7C%0D%0A++++case+%0D%0A++++++++when+nullif%28trim%28context%29%2C+%27%27%29+is+not+null+%0D%0A++++++++then+%27%2C+%27+%7C%7C+context+%0D%0A++++++++else+%27%27+%0D%0A++++end+%7C%7C%0D%0A++++%27%3C%2Fp%3E%27+as+html%2C%0D%0A++++%27null%27+as+json%2C%0D%0A++++source_url+as+external_url%0D%0A++from+blog_quotation%0D%0A++union+all%0D%0A++select%0D%0A++++id%2C%0D%0A++++%27note%27+as+type%2C%0D%0A++++case%0D%0A++++++when+title+is+not+null+and+title+%3C%3E+%27%27+then+title%0D%0A++++++else+%27Note+on+%27+%7C%7C+date%28created%29%0D%0A++++end%2C%0D%0A++++created%2C%0D%0A++++slug%2C%0D%0A++++%27No+HTML%27%2C%0D%0A++++json_object%28%0D%0A++++++%27created%27%2C+date%28created%29%2C%0D%0A++++++%27link_url%27%2C+%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%0D%0A++++++%7C%7C+%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27%2C%0D%0A++++++%27link_title%27%2C+%27%27%2C%0D%0A++++++%27commentary%27%2C+body%2C%0D%0A++++++%27use_markdown%27%2C+1%0D%0A++++%29%2C%0D%0A++++%27%27+as+external_url%0D%0A++from+blog_note%0D%0A++union+all%0D%0A++select%0D%0A++++rowid%2C%0D%0A++++%27til%27+as+type%2C%0D%0A++++title%2C%0D%0A++++created%2C%0D%0A++++%27null%27+as+slug%2C%0D%0A++++%27%3Cp%3E%3Cstrong%3ETIL%3C%2Fstrong%3E+%27+%7C%7C+date%28created%29+%7C%7C+%27+%3Ca+href%3D%22%27%7C%7C+%27https%3A%2F%2Ftil.simonwillison.net%2F%27+%7C%7C+topic+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%22%3E%27+%7C%7C+title+%7C%7C+%27%3C%2Fa%3E%3A%27+%7C%7C+%27+%27+%7C%7C+substr%28html%2C+1%2C+instr%28html%2C+%27%3C%2Fp%3E%27%29+-+1%29+%7C%7C+%27+%26%238230%3B%3C%2Fp%3E%27+as+html%2C%0D%0A++++%27null%27+as+json%2C%0D%0A++++%27https%3A%2F%2Ftil.simonwillison.net%2F%27+%7C%7C+topic+%7C%7C+%27%2F%27+%7C%7C+slug+as+external_url%0D%0A++from+til%0D%0A%29%2C%0D%0Acollected+as+%28%0D%0A++select%0D%0A++++id%2C%0D%0A++++type%2C%0D%0A++++title%2C%0D%0A++++case%0D%0A++++++when+type+%3D+%27til%27%0D%0A++++++then+external_url%0D%0A++++++else+%27https%3A%2F%2Fsimonwillison.net%2F%27+%7C%7C+strftime%28%27%25Y%2F%27%2C+created%29%0D%0A++++++%7C%7C+substr%28%27JanFebMarAprMayJunJulAugSepOctNovDec%27%2C+%28strftime%28%27%25m%27%2C+created%29+-+1%29+*+3+%2B+1%2C+3%29+%7C%7C+%0D%0A++++++%27%2F%27+%7C%7C+cast%28strftime%28%27%25d%27%2C+created%29+as+integer%29+%7C%7C+%27%2F%27+%7C%7C+slug+%7C%7C+%27%2F%27%0D%0A++++++end+as+url%2C%0D%0A++++created%2C%0D%0A++++html%2C%0D%0A++++json%2C%0D%0A++++external_url%2C%0D%0A++++case%0D%0A++++++when+type+%3D+%27entry%27+then+%28%0D%0A++++++++select+json_group_array%28tag%29%0D%0A++++++++from+blog_tag%0D%0A++++++++join+blog_entry_tags+on+blog_tag.id+%3D+blog_entry_tags.tag_id%0D%0A++++++++where+blog_entry_tags.entry_id+%3D+content.id%0D%0A++++++%29%0D%0A++++++when+type+%3D+%27blogmark%27+then+%28%0D%0A++++++++select+json_group_array%28tag%29%0D%0A++++++++from+blog_tag%0D%0A++++++++join+blog_blogmark_tags+on+blog_tag.id+%3D+blog_blogmark_tags.tag_id%0D%0A++++++++where+blog_blogmark_tags.blogmark_id+%3D+content.id%0D%0A++++++%29%0D%0A++++++when+type+%3D+%27quotation%27+then+%28%0D%0A++++++++select+json_group_array%28tag%29%0D%0A++++++++from+blog_tag%0D%0A++++++++join+blog_quotation_tags+on+blog_tag.id+%3D+blog_quotation_tags.tag_id%0D%0A++++++++where+blog_quotation_tags.quotation_id+%3D+content.id%0D%0A++++++%29%0D%0A++++++else+%27%5B%5D%27%0D%0A++++end+as+tags%0D%0A++from+content%0D%0A++where+created+%3E%3D+date%28%27now%27%2C+%27-%27+%7C%7C+%3Anumdays+%7C%7C+%27+days%27%29+++%0D%0A++order+by+created+desc%0D%0A%29%0D%0Aselect+id%2C+type%2C+title%2C+url%2C+created%2C+html%2C+json%2C+external_url%2C+tags%0D%0Afrom+collected+%0D%0Aorder+by+%0D%0A++case+type+%0D%0A++++when+%27entry%27+then+0+%0D%0A++++else+1+%0D%0A++end%2C%0D%0A++case+type+%0D%0A++++when+%27entry%27+then+created+%0D%0A++++else+-strftime%28%27%25s%27%2C+created%29+%0D%0A++end+desc%3B&amp;amp;numdays=7"&gt;see and execute that query&lt;/a&gt; directly in Datasette. It's 143 lines of convoluted SQL that assembles most of the HTML for the newsletter using SQLite string concatenation! An illustrative snippet:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;with content &lt;span class="pl-k"&gt;as&lt;/span&gt; (
  &lt;span class="pl-k"&gt;select&lt;/span&gt;
    id,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;entry&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; type,
    title,
    created,
    slug,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;h3&amp;gt;&amp;lt;a href="&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://simonwillison.net/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%Y/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created)
      &lt;span class="pl-k"&gt;||&lt;/span&gt; substr(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;JanFebMarAprMayJunJulAugSepOctNovDec&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, (strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%m&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;-&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;) &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-c1"&gt;3&lt;/span&gt; &lt;span class="pl-k"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;1&lt;/span&gt;, &lt;span class="pl-c1"&gt;3&lt;/span&gt;) 
      &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; cast(strftime(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%d&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;, created) &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-k"&gt;integer&lt;/span&gt;) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; slug &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;"&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; 
      &lt;span class="pl-k"&gt;||&lt;/span&gt; title &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/a&amp;gt; - &lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(created) &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&amp;lt;/h3&amp;gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; body
      &lt;span class="pl-k"&gt;as&lt;/span&gt; html,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;null&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; json,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;as&lt;/span&gt; external_url
  &lt;span class="pl-k"&gt;from&lt;/span&gt; blog_entry
  &lt;span class="pl-k"&gt;union all&lt;/span&gt;
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; ...&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;My blog's URLs look like &lt;code&gt;/2025/Nov/18/gemini-3/&lt;/code&gt; - this SQL constructs that three letter month abbreviation from the month number using a substring operation.&lt;/p&gt;
&lt;p&gt;This is a &lt;em&gt;terrible&lt;/em&gt; way to assemble HTML, but I've stuck with it because it amuses me.&lt;/p&gt;
&lt;p&gt;The rest of the Observable notebook takes that data, filters out anything that links to content mentioned in the previous newsletters and composes it into a block of HTML that can be copied using that big button.&lt;/p&gt;
&lt;p&gt;Here's the recipe it uses to turn HTML into rich text content on a clipboard suitable for Substack. I can't remember how I figured this out but it's very effective:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-v"&gt;Object&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;assign&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-en"&gt;html&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;span class="pl-kos"&gt;&amp;lt;&lt;/span&gt;&lt;span class="pl-ent"&gt;button&lt;/span&gt; &lt;span class="pl-c1"&gt;style&lt;/span&gt;="&lt;span class="pl-s"&gt;font-size: 1.4em; padding: 0.3em 1em; font-weight: bold;&lt;/span&gt;"&lt;span class="pl-kos"&gt;&amp;gt;&lt;/span&gt;Copy rich text newsletter to clipboard`&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-en"&gt;onclick&lt;/span&gt;: &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;htmlContent&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;newsletterHTML&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Create a temporary element to hold the HTML content&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;tempElement&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;createElement&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"div"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;innerHTML&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;htmlContent&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;body&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;appendChild&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Select the HTML content&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;range&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;createRange&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;range&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;selectNode&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Copy the selected HTML content to the clipboard&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;selection&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-smi"&gt;window&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;getSelection&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeAllRanges&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;addRange&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;range&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;execCommand&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"copy"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-s1"&gt;selection&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeAllRanges&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;body&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeChild&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;tempElement&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4 id="from-django-postgresql-to-datasette-sqlite"&gt;From Django+Postgresql to Datasette+SQLite&lt;/h4&gt;
&lt;p&gt;My blog itself is a Django application hosted on Heroku, with data stored in Heroku PostgreSQL. Here's &lt;a href="https://github.com/simonw/simonwillisonblog"&gt;the source code for that Django application&lt;/a&gt;. I use the Django admin as my CMS.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; provides a JSON API over a SQLite database... which means something needs to convert that PostgreSQL database into a SQLite database that Datasette can use.&lt;/p&gt;
&lt;p&gt;My system for doing that lives in the &lt;a href="https://github.com/simonw/simonwillisonblog-backup"&gt;simonw/simonwillisonblog-backup&lt;/a&gt; GitHub repository. It uses GitHub Actions on a schedule that executes every two hours, fetching the latest data from PostgreSQL and converting that to SQLite.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/db-to-sqlite"&gt;db-to-sqlite&lt;/a&gt; tool is responsible for that conversion. I call it &lt;a href="https://github.com/simonw/simonwillisonblog-backup/blob/dc5b9df272134ce051a5280b4de6d4daa9b2a9fc/.github/workflows/backup.yml#L44-L62"&gt;like this&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;db-to-sqlite \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;$(&lt;/span&gt;heroku config:get DATABASE_URL -a simonwillisonblog &lt;span class="pl-k"&gt;|&lt;/span&gt; sed s/postgres:/postgresql+psycopg2:/&lt;span class="pl-pds"&gt;)&lt;/span&gt;&lt;/span&gt; \
  simonwillisonblog.db \
  --table auth_permission \
  --table auth_user \
  --table blog_blogmark \
  --table blog_blogmark_tags \
  --table blog_entry \
  --table blog_entry_tags \
  --table blog_quotation \
  --table blog_quotation_tags \
  --table blog_note \
  --table blog_note_tags \
  --table blog_tag \
  --table blog_previoustagname \
  --table blog_series \
  --table django_content_type \
  --table redirects_redirect&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That &lt;code&gt;heroku config:get DATABASE_URL&lt;/code&gt; command uses Heroku credentials in an environment variable to fetch the database connection URL for my blog's PostgreSQL database (and fixes a small difference in the URL scheme).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;db-to-sqlite&lt;/code&gt; can then export that data and write it to a SQLite database file called &lt;code&gt;simonwillisonblog.db&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--table&lt;/code&gt; options specify the tables that should be included in the export.&lt;/p&gt;
&lt;p&gt;The repository does more than just that conversion: it also exports the resulting data to JSON files that live in the repository, which gives me a &lt;a href="https://github.com/simonw/simonwillisonblog-backup/commits/main/simonwillisonblog"&gt;commit history&lt;/a&gt; of changes I make to my content. This is a cheap way to get a revision history of my blog content without having to mess around with detailed history tracking inside the Django application itself.&lt;/p&gt;
&lt;p&gt;At the &lt;a href="https://github.com/simonw/simonwillisonblog-backup/blob/dc5b9df272134ce051a5280b4de6d4daa9b2a9fc/.github/workflows/backup.yml#L200-L204"&gt;end of my GitHub Actions workflow&lt;/a&gt; is this code that publishes the resulting database to Datasette running on &lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt; using the &lt;a href="https://datasette.io/plugins/datasette-publish-fly"&gt;datasette publish fly&lt;/a&gt; plugin:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;datasette publish fly simonwillisonblog.db \
  -m metadata.yml \
  --app simonwillisonblog-backup \
  --branch 1.0a2 \
  --extra-options &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;--setting sql_time_limit_ms 15000 --setting truncate_cells_html 10000 --setting allow_facet off&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  --install datasette-block-robots \
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; ... more plugins&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, there are a lot of moving parts! Surprisingly it all mostly just works - I rarely have to intervene in the process, and the cost of those different components is pleasantly low.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/blogging"&gt;blogging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/youtube"&gt;youtube&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/heroku"&gt;heroku&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/observable"&gt;observable&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/newsletter"&gt;newsletter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/substack"&gt;substack&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/site-upgrades"&gt;site-upgrades&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="blogging"/><category term="django"/><category term="javascript"/><category term="postgresql"/><category term="sql"/><category term="sqlite"/><category term="youtube"/><category term="heroku"/><category term="datasette"/><category term="observable"/><category term="github-actions"/><category term="fly"/><category term="newsletter"/><category term="substack"/><category term="site-upgrades"/></entry><entry><title>You should write an agent</title><link href="https://simonwillison.net/2025/Nov/7/you-should-write-an-agent/#atom-tag" rel="alternate"/><published>2025-11-07T04:40:12+00:00</published><updated>2025-11-07T04:40:12+00:00</updated><id>https://simonwillison.net/2025/Nov/7/you-should-write-an-agent/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/everyone-write-an-agent/"&gt;You should write an agent&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Thomas Ptacek on the Fly blog:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Agents are the most surprising programming experience I’ve had in my career. Not because I’m awed by the magnitude of their powers — I like them, but I don’t like-like them. It’s because of how easy it was to get one up on its legs, and how much I learned doing that.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I think he's right: hooking up a simple agentic loop that prompts an LLM and runs a tool for it any time it request one really is the new "hello world" of AI engineering.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=45840088"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;&lt;/p&gt;



</summary><category term="thomas-ptacek"/><category term="ai"/><category term="fly"/><category term="generative-ai"/><category term="llms"/><category term="ai-agents"/></entry><entry><title>Litestream v0.5.0 is Here</title><link href="https://simonwillison.net/2025/Oct/3/litestream/#atom-tag" rel="alternate"/><published>2025-10-03T15:10:21+00:00</published><updated>2025-10-03T15:10:21+00:00</updated><id>https://simonwillison.net/2025/Oct/3/litestream/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/litestream-v050-is-here/"&gt;Litestream v0.5.0 is Here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I've been running &lt;a href="https://litestream.io"&gt;Litestream&lt;/a&gt; to backup SQLite databases in production for a couple of years now without incident. The new version has been a long time coming - Ben Johnson took &lt;a href="https://simonwillison.net/2022/Sep/21/introducing-litefs/"&gt;a detour&lt;/a&gt; into the FUSE-based &lt;a href="https://github.com/superfly/litefs"&gt;LiteFS&lt;/a&gt; before deciding that the single binary Litestream approach is more popular - and Litestream 0.5 just landed with this very detailed blog posts describing the improved architecture.&lt;/p&gt;
&lt;p&gt;SQLite stores data in pages - 4096 (by default) byte blocks of data. Litestream replicates modified pages to a backup location - usually object storage like S3.&lt;/p&gt;
&lt;p&gt;Most SQLite tables have an auto-incrementing primary key, which is used to decide which page the row's data should be stored in. This means sequential inserts to a small table are sent to the same page, which caused previous Litestream to replicate many slightly different copies of that page block in succession.&lt;/p&gt;
&lt;p&gt;The new LTX format - borrowed from LiteFS - addresses that by adding compaction, which Ben describes as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We can use LTX compaction to compress a bunch of LTX files into a single file with no duplicated pages. And Litestream now uses this capability to create a hierarchy of compactions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;at Level 1, we compact all the changes in a 30-second time window&lt;/li&gt;
&lt;li&gt;at Level 2, all the Level 1 files in a 5-minute window&lt;/li&gt;
&lt;li&gt;at Level 3, all the Level 2’s over an hour.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Net result: we can restore a SQLite database to any point in time, &lt;em&gt;using only a dozen or so files on average&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I'm most looking forward to trying out the feature that isn't quite landed yet: read-replicas, implemented using a SQLite &lt;a href="https://www.sqlite.org/vfs.html"&gt;VFS extension&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The next major feature we’re building out is a Litestream VFS for read replicas. This will let you instantly spin up a copy of the database and immediately read pages from S3 while the rest of the database is hydrating in the background.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=45453936"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ben-johnson"&gt;ben-johnson&lt;/a&gt;&lt;/p&gt;



</summary><category term="sqlite"/><category term="fly"/><category term="litestream"/><category term="ben-johnson"/></entry><entry><title>Phoenix.new is Fly's entry into the prompt-driven app development space</title><link href="https://simonwillison.net/2025/Jun/23/phoenix-new/#atom-tag" rel="alternate"/><published>2025-06-23T18:42:02+00:00</published><updated>2025-06-23T18:42:02+00:00</updated><id>https://simonwillison.net/2025/Jun/23/phoenix-new/#atom-tag</id><summary type="html">
    &lt;p&gt;Here's a fascinating new entrant into the AI-assisted-programming / coding-agents space by &lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt;, introduced on their blog in &lt;a href="https://fly.io/blog/phoenix-new-the-remote-ai-runtime/"&gt;Phoenix.new – The Remote AI Runtime for Phoenix&lt;/a&gt;: describe an app in a prompt, get a full Phoenix application, backed by SQLite and running on Fly's hosting platform. The &lt;a href="https://www.youtube.com/watch?v=du7GmWGUM5Y"&gt;official Phoenix.new YouTube launch video&lt;/a&gt; is a good way to get a sense for what this does.&lt;/p&gt;
&lt;h4 id="background-on-phoenix-and-elixir-and-fly"&gt;Background on Phoenix and Elixir and Fly&lt;/h4&gt;
&lt;p&gt;First, some background. &lt;a href="https://www.phoenixframework.org/"&gt;Phoenix&lt;/a&gt; is an open source web framework for Elixir, the Ruby-like language that compiles to Erlang's BEAM bytecode and runs on top of the highly concurrent Erlang runtime. The signature feature of the framework is &lt;a href="https://github.com/phoenixframework/phoenix_live_view/blob/main/README.md#feature-highlights"&gt;Phoenix LiveView&lt;/a&gt;, a toolkit for building realtime interfaces through streaming diffs to server-side HTML over a WebSocket connection.&lt;/p&gt;
&lt;p&gt;Phoenix was created by Chris McCord 11 years ago, and Chris joined Fly nearly four years ago. &lt;a href="http://phoenix.new/"&gt;Phoenix.new&lt;/a&gt; is his latest project.&lt;/p&gt;
&lt;p&gt;Phoenix LiveView is a really great fit for Fly's geographically distributed application serving infrastructure. Fly co-founder Kurt Mackey &lt;a href="https://fly.io/blog/low-latency-liveview/"&gt;wrote about that&lt;/a&gt; in April 2021, before they had hired Chris, describing how LiveView benefits from low latency by "moving app processes close to users" - something Fly has been designed to help with from the start.&lt;/p&gt;
&lt;p&gt;There's one major challenge though: Elixir is still a niche programming language, which means the number of people out there who are ready to spin up a new Phoenix app has always been artificially limited.&lt;/p&gt;
&lt;p&gt;Fly's solution? Get LLMs to shave that learning curve down to &lt;em&gt;almost nothing&lt;/em&gt;.&lt;/p&gt;
&lt;h4 id="prompt-driven-application-development-with-phoenix-new"&gt;Prompt-driven application development with Phoenix.new&lt;/h4&gt;
&lt;p&gt;Phoenix.new is a prompt-driven application development platform. You describe what you want to build, then watch as an LLM-powered coding agent writes, tests and iterates on code to help achieve that goal. It's in the same broad category as &lt;a href="https://lovable.dev/"&gt;Lovable&lt;/a&gt;, &lt;a href="https://v0.dev/"&gt;v0.dev&lt;/a&gt; and and &lt;a href="https://townie.val.run/"&gt;Val Town's Townie&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One of the most important problems to solve with coding agents is to give them a robust sandbox where they can run code without breaking things outside of that space. Fly, at their heart, are a sandboxing company - their &lt;a href="https://fly.io/docs/machines/"&gt;Fly Machines&lt;/a&gt; product makes it trivial to spin up a new sandboxed VM in just a few seconds. I'm building &lt;a href="https://www.datasette.cloud/"&gt;Datasette Cloud&lt;/a&gt; on Fly for exactly that reason.&lt;/p&gt;
&lt;p&gt;I tried out Phoenix.new with the following starter prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;A notebook application. Notes are rich text, using a nice visual markdown editors. The first line of a note becomes the title. Notes have a URL that is a 6 character random text string - thise can be used to link notes to each other by dropping in an @ sign and that text string. A hash and then text becomes a tag, clicking a tag goes to a page with that tag on. Notes are all in SQLite FTS so they can be searched with a search tool.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Watching it work was &lt;em&gt;fascinating&lt;/em&gt;. It provides a full IDE-style environment in the browser, running a dedicated container and with a forwarded port to allow you to interact with the running application &lt;em&gt;while it is being built&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/phoenix-new.jpg" alt="Browser IDE interface showing a notebook app development project with a left sidebar containing a chat session that starts &amp;quot;What would you like to build?&amp;quot; with options like &amp;quot;todo list&amp;quot;, &amp;quot;chat app&amp;quot;, &amp;quot;product landing page&amp;quot;, then a main editor displaying plan.md file with detailed development steps including &amp;quot;Generate a Phoenix LiveView project called 'notebook'&amp;quot; and &amp;quot;Start the server so you can view the app as we build it&amp;quot;. The bottom has a terminal showing &amp;quot;All dependencies are up to date&amp;quot;, &amp;quot;Generated notebook app&amp;quot;, &amp;quot;Done in 241ms&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;It spun up a &lt;code&gt;plan.md&lt;/code&gt;, planned out the approach and started working through the code. Most interestingly, it constantly tested the application it was building using a testing tool that looked similar to my own &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt;. Here's an example of the output from that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I can see the issue! The page is still showing the "Welcome to your Notebook" message instead of the form. Let me wait a moment for the JavaScript to execute and then check if the form appeared.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$ web http://localhost:4000 --js "setTimeout(() =&amp;gt; { document.querySelector('button[phx-click=\"start_creating\"]').click(); }, 500); setTimeout(() =&amp;gt; { console.log('Form visible?', !!document.querySelector('textarea')); console.log('Creating note state:', document.body.innerHTML.includes('Create New Note')); }, 2000);"&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I'm used to coding agents that write and execute their own tests, but this is the most impressive example I've seen of a coding agent that actively interacts with and tests the live application while it is iterating on the code.&lt;/p&gt;
&lt;p&gt;Most interestingly of all, since LiveView is a realtime framework, I could interact directly with the application at the same time as the machine was writing and testing it, seeing any changes it made instantly in my browser!&lt;/p&gt;
&lt;p&gt;How did it do? It got there in the end, but with hindsight my initial prompt was a little too complex: I had to rescue it after it got stuck in an error loop trying to implement SQLite full-text search. Just like a project you build yourself it turns out to work much better if you start with the simplest possible version of the application and then iterate on it to add additional features.&lt;/p&gt;
&lt;p&gt;One of my favorite details is how Git integration works. Phoenix.new commits constantly as it works, and a menu option for "Clone X to your local computer" then gives you a command that looks like this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;git clone "https://phoenix.new/git/WENQLj...big-random-string...VHFW/$RANDOM/notebook"&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Run that locally to get a full copy of the repo! I ran the following to push it to a new repository in my GitHub account:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;git remote add github https://github.com/simonw/phoenix-new-notebook.git
git push -u github main&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can see the code (and the &lt;a href="https://github.com/simonw/phoenix-new-notebook/commits"&gt;commit history&lt;/a&gt;) in my &lt;a href="https://github.com/simonw/phoenix-new-notebook"&gt;simonw/phoenix-new-notebook&lt;/a&gt; repo.&lt;/p&gt;
&lt;h4 id="how-much-did-i-learn-"&gt;How much did I learn?&lt;/h4&gt;
&lt;p&gt;My initial experiments with Phoenix.new were very much &lt;a href="https://simonwillison.net/2025/Mar/19/vibe-coding/"&gt;vibe coding&lt;/a&gt; - I interacted with the resulting application but didn't pay a great deal of attention to the code that was being written, despite it being presented to me in an IDE that made it very easy to review what was going on.&lt;/p&gt;
&lt;p&gt;As a result, I didn't learn much about the low-level syntax details of how Phoenix and Elixir work. I did however get a strong feel for the &lt;em&gt;shape&lt;/em&gt; of Elixir and Phoenix at a high level as the code danced past my eye.&lt;/p&gt;
&lt;p&gt;It turns out having an LLM write an application in front of you is a great way to start building understanding of what a framework can do.&lt;/p&gt;
&lt;p&gt;It’s almost like watching a YouTube livestream of an experienced developer speed running building an application, except that app is exactly what you asked them to build and you can interject and send them in a new direction at any moment.&lt;/p&gt;
&lt;h4 id="expanding-beyond-elixir-and-phoenix"&gt;Expanding beyond Elixir and Phoenix&lt;/h4&gt;
&lt;p&gt;Chris's announcement included this note:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At this point you might be wondering – can I just ask it to build a Rails app? Or an Expo React Native app? Or Svelte? Or Go?&lt;/p&gt;
&lt;p&gt;Yes, you can.&lt;/p&gt;
&lt;p&gt;Our system prompt is tuned for Phoenix today, but all languages you care about are already installed. We’re still figuring out where to take this, but adding new languages and frameworks definitely ranks highly in my plans.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The browser-based IDE includes a terminal, and I checked and &lt;code&gt;python3&lt;/code&gt; and &lt;code&gt;python3 -m pip install datasette&lt;/code&gt; work there already.&lt;/p&gt;
&lt;p&gt;If Fly do evolve this into a framework-agnostic tool for building web apps they'll probably need to rebrand it from Phoenix.new to something a bit more generic!&lt;/p&gt;
&lt;p&gt;Phoenix.new is currently priced as a $20/month subscription. Val Town recently switched the pricing for their similar Townie assistant from a subscription to to &lt;a href="https://blog.val.town/townie-credits"&gt;pay per use&lt;/a&gt;, presumably because for many users this kind of tool is something they would only use intermittently, although given the capabilities of Phoenix.new it's possible it could become a monthly driver for people, especially as it expands out to cover more frameworks.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;em&gt;Fly sponsor some of our work on Datasette Cloud (see &lt;a href="https://simonwillison.net/about/#disclosures"&gt;disclosures&lt;/a&gt;), but this article is not sponsored content and Fly did not request or review this post.&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/erlang"&gt;erlang&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-to-app"&gt;prompt-to-app&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="erlang"/><category term="sqlite"/><category term="ai"/><category term="fly"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-agents"/><category term="vibe-coding"/><category term="coding-agents"/><category term="prompt-to-app"/></entry><entry><title>django-simple-deploy</title><link href="https://simonwillison.net/2025/May/17/django-simple-deploy/#atom-tag" rel="alternate"/><published>2025-05-17T12:49:52+00:00</published><updated>2025-05-17T12:49:52+00:00</updated><id>https://simonwillison.net/2025/May/17/django-simple-deploy/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://django-simple-deploy.readthedocs.io/"&gt;django-simple-deploy&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Eric Matthes presented a lightning talk about this project at PyCon US this morning. "Django has a deploy command now". You can run it like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pip install django-simple-deploy[fly_io]
# Add django_simple_deploy to INSTALLED_APPS.
python manage.py deploy --automate-all
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It's plugin-based (&lt;a href="https://github.com/django-simple-deploy/django-simple-deploy/issues/313"&gt;inspired by Datasette!&lt;/a&gt;) and the project has stable plugins for three hosting platforms: &lt;a href="https://github.com/django-simple-deploy/dsd-flyio"&gt;dsd-flyio&lt;/a&gt;, &lt;a href="https://github.com/django-simple-deploy/dsd-heroku"&gt;dsd-heroku&lt;/a&gt; and &lt;a href="https://github.com/django-simple-deploy/dsd-platformsh"&gt;dsd-platformsh&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Currently in development: &lt;a href="https://github.com/django-simple-deploy/dsd-vps"&gt;dsd-vps&lt;/a&gt; - a plugin that should work with any VPS provider, using &lt;a href="https://www.paramiko.org/"&gt;Paramiko&lt;/a&gt; to connect to a newly created instance and &lt;a href="https://github.com/django-simple-deploy/dsd-vps/blob/a372fc7b7fd31cd2ad3cf22d68b9c9fecb65d17a/dsd_vps/utils.py"&gt;run all of the commands&lt;/a&gt; needed to start serving a Django application.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/paramiko"&gt;paramiko&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/heroku"&gt;heroku&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;



</summary><category term="django"/><category term="paramiko"/><category term="plugins"/><category term="python"/><category term="heroku"/><category term="datasette"/><category term="fly"/></entry><entry><title>Fly: We're Cutting L40S Prices In Half</title><link href="https://simonwillison.net/2024/Aug/16/fly-were-cutting-l40s-prices-in-half/#atom-tag" rel="alternate"/><published>2024-08-16T04:44:04+00:00</published><updated>2024-08-16T04:44:04+00:00</updated><id>https://simonwillison.net/2024/Aug/16/fly-were-cutting-l40s-prices-in-half/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/"&gt;Fly: We&amp;#x27;re Cutting L40S Prices In Half&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Interesting insider notes from &lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt; on customer demand for GPUs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If you had asked us in 2023 what the biggest GPU problem we could solve was, we’d have said “selling fractional A100 slices”. [...] We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10. &lt;/p&gt;
&lt;p&gt;[…] If you’re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=41261902"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpus"&gt;gpus&lt;/a&gt;&lt;/p&gt;



</summary><category term="fly"/><category term="gpus"/></entry><entry><title>Making Machines Move</title><link href="https://simonwillison.net/2024/Jul/30/making-machines-move/#atom-tag" rel="alternate"/><published>2024-07-30T21:45:32+00:00</published><updated>2024-07-30T21:45:32+00:00</updated><id>https://simonwillison.net/2024/Jul/30/making-machines-move/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/machine-migrations/"&gt;Making Machines Move&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Another deep technical dive into Fly.io infrastructure from Thomas Ptacek, this time describing how they can quickly boot up an instance with a persistent volume on a new host (for things like zero-downtime deploys) using a block-level cloning operation, so the new instance gets a volume that becomes accessible instantly, serving proxied blocks of data until the new volume has been completely migrated from the old host.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ops"&gt;ops&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/zero-downtime"&gt;zero-downtime&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;



</summary><category term="ops"/><category term="thomas-ptacek"/><category term="zero-downtime"/><category term="fly"/></entry><entry><title>GPUs on Fly.io are available to everyone!</title><link href="https://simonwillison.net/2024/Feb/14/gpus-on-flyio-are-available-to-everyone/#atom-tag" rel="alternate"/><published>2024-02-14T04:28:23+00:00</published><updated>2024-02-14T04:28:23+00:00</updated><id>https://simonwillison.net/2024/Feb/14/gpus-on-flyio-are-available-to-everyone/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/gpu-ga/"&gt;GPUs on Fly.io are available to everyone!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
We’ve been experimenting with GPUs on Fly for a few months for Datasette Cloud. They’re well documented and quite easy to use—any example Python code you find that uses NVIDIA CUDA stuff generally Just Works. Most interestingly of all, Fly GPUs can scale to zero—so while they cost $2.50/hr for a A100 40G (VRAM) and $3.50/hr for a A100 80G you can configure them to stop running when the machine runs out of things to do.&lt;/p&gt;

&lt;p&gt;We’ve successfully used them to run Whisper and to experiment with running various Llama 2 LLMs as well.&lt;/p&gt;

&lt;p&gt;To look forward to: “We are working on getting some lower-cost A10 GPUs in the next few weeks”.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/whisper"&gt;whisper&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nvidia"&gt;nvidia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpus"&gt;gpus&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="datasette-cloud"/><category term="fly"/><category term="generative-ai"/><category term="whisper"/><category term="llms"/><category term="nvidia"/><category term="gpus"/></entry><entry><title>Macaroons Escalated Quickly</title><link href="https://simonwillison.net/2024/Jan/31/macaroons-escalated-quickly/#atom-tag" rel="alternate"/><published>2024-01-31T16:57:23+00:00</published><updated>2024-01-31T16:57:23+00:00</updated><id>https://simonwillison.net/2024/Jan/31/macaroons-escalated-quickly/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/macaroons-escalated-quickly/"&gt;Macaroons Escalated Quickly&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Thomas Ptacek’s follow-up on Macaroon tokens, based on a two year project to implement them at Fly.io. The way they let end users calculate new signed tokens with additional limitations applied to them (“caveats” in Macaroon terminology) is fascinating, and allows for some very creative solutions.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=39205676"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apis"&gt;apis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;



</summary><category term="apis"/><category term="security"/><category term="thomas-ptacek"/><category term="fly"/></entry><entry><title>Introducing datasette-write-ui: a Datasette plugin for editing, inserting, and deleting rows</title><link href="https://simonwillison.net/2023/Aug/16/datasette-write-ui/#atom-tag" rel="alternate"/><published>2023-08-16T01:48:07+00:00</published><updated>2023-08-16T01:48:07+00:00</updated><id>https://simonwillison.net/2023/Aug/16/datasette-write-ui/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.datasette.cloud/blog/2023/datasette-write-ui/"&gt;Introducing datasette-write-ui: a Datasette plugin for editing, inserting, and deleting rows&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Alex García is working with me on Datasette Cloud for the next few months, graciously sponsored by Fly. We will be working in public, releasing open source code and documenting how to build a multi-tenant SaaS product using Fly Machines.&lt;/p&gt;

&lt;p&gt;Alex’s first project is datasette-write-ui, a plugin that finally lets you directly edit data stored inside Datasette. Alex wrote about the plugin on our new Datasette Cloud blog.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/alex-garcia"&gt;alex-garcia&lt;/a&gt;&lt;/p&gt;



</summary><category term="plugins"/><category term="datasette"/><category term="datasette-cloud"/><category term="fly"/><category term="alex-garcia"/></entry><entry><title>Vector Search</title><link href="https://simonwillison.net/2023/Jun/2/vector-search/#atom-tag" rel="alternate"/><published>2023-06-02T05:02:22+00:00</published><updated>2023-06-02T05:02:22+00:00</updated><id>https://simonwillison.net/2023/Jun/2/vector-search/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://amjith.com/blog/2023/vector_search/"&gt;Vector Search&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Amjith Ramanujam provides a very thorough tutorial on implementing vector similarity search using SentenceTransformers embeddings (all-MiniLM-L6-v2) executed using sqlite-utils, then served via datasette-sqlite-vss and deployed using Fly.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vector-search"&gt;vector-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/embeddings"&gt;embeddings&lt;/a&gt;&lt;/p&gt;



</summary><category term="sqlite"/><category term="ai"/><category term="datasette"/><category term="fly"/><category term="vector-search"/><category term="embeddings"/></entry><entry><title>Carving the Scheduler Out of Our Orchestrator</title><link href="https://simonwillison.net/2023/Feb/2/carving-the-scheduler-out-of-our-orchestrator/#atom-tag" rel="alternate"/><published>2023-02-02T21:46:35+00:00</published><updated>2023-02-02T21:46:35+00:00</updated><id>https://simonwillison.net/2023/Feb/2/carving-the-scheduler-out-of-our-orchestrator/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/"&gt;Carving the Scheduler Out of Our Orchestrator&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Thomas Ptacek describes Fly’s new custom-built alternative to Nomad and Kubernetes in detail, including why they eventually needed to build something custom to best serve their platform. In doing so he provides the best explanation I’ve ever seen of what an orchestration system actually does.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://infosec.exchange/@tqbf/109796131985160831"&gt;@tqbf&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kubernetes"&gt;kubernetes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;



</summary><category term="thomas-ptacek"/><category term="kubernetes"/><category term="fly"/></entry><entry><title>Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare</title><link href="https://simonwillison.net/2022/Oct/7/fly-cloudflare/#atom-tag" rel="alternate"/><published>2022-10-07T17:47:34+00:00</published><updated>2022-10-07T17:47:34+00:00</updated><id>https://simonwillison.net/2022/Oct/7/fly-cloudflare/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.dahl.dev/posts/stringing-together-several-free-tiers-to-host-an-application-with-zero-cost/"&gt;Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Alexander Dahl provides a detailed description (and code) for his current preferred free hosting solution for small sites: SQLite (and a Go application) running on Fly’s free tier, with the database replicated up to Cloudflare’s R2 object storage (again on a free tier) by Litestream.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hosting"&gt;hosting&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cloudflare"&gt;cloudflare&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;&lt;/p&gt;



</summary><category term="hosting"/><category term="sqlite"/><category term="cloudflare"/><category term="fly"/><category term="litestream"/></entry><entry><title>Introducing LiteFS</title><link href="https://simonwillison.net/2022/Sep/21/introducing-litefs/#atom-tag" rel="alternate"/><published>2022-09-21T18:56:42+00:00</published><updated>2022-09-21T18:56:42+00:00</updated><id>https://simonwillison.net/2022/Sep/21/introducing-litefs/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/introducing-litefs/"&gt;Introducing LiteFS&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
LiteFS is the new SQLite replication solution from Fly, now ready for beta testing. It’s from the same author as Litestream but has a very different architecture; LiteFS works by implementing a custom FUSE filesystem which spies on SQLite transactions being written to the journal file and forwards them on to other nodes in the cluster, providing full read-replication. The signature Litestream feature of streaming a backup to S3 should be coming within the next few months.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=32925734"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/replication"&gt;replication&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ben-johnson"&gt;ben-johnson&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="replication"/><category term="sqlite"/><category term="fly"/><category term="litestream"/><category term="ben-johnson"/></entry><entry><title>Quoting Thomas Ptacek</title><link href="https://simonwillison.net/2022/Sep/16/thomas-ptacek/#atom-tag" rel="alternate"/><published>2022-09-16T01:49:27+00:00</published><updated>2022-09-16T01:49:27+00:00</updated><id>https://simonwillison.net/2022/Sep/16/thomas-ptacek/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://news.ycombinator.com/item?id=32857811"&gt;&lt;p&gt;[SQLite is] a database that in full-stack culture has been relegated to "unit test database mock" for about 15 years that is (1) surprisingly capable as a SQL engine, (2) the simplest SQL database to get your head around and manage, and (3) can embed directly in literally every application stack, which is especially interesting in latency-sensitive and globally-distributed applications.&lt;/p&gt;
&lt;p&gt;Reason (3) is clearly our ulterior motive here, so we're not disinterested: our model user deploys a full-stack app (Rails, Elixir, Express, whatever) in a bunch of regions around the world, hoping for sub-100ms responses for users in most places around the world. Even within a single data center, repeated queries to SQL servers can blow that budget. Running an in-process SQL server neatly addresses it.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://news.ycombinator.com/item?id=32857811"&gt;Thomas Ptacek&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;



</summary><category term="sql"/><category term="sqlite"/><category term="thomas-ptacek"/><category term="fly"/></entry><entry><title>Exploring the training data behind Stable Diffusion</title><link href="https://simonwillison.net/2022/Sep/5/laion-aesthetics-weeknotes/#atom-tag" rel="alternate"/><published>2022-09-05T00:18:42+00:00</published><updated>2022-09-05T00:18:42+00:00</updated><id>https://simonwillison.net/2022/Sep/5/laion-aesthetics-weeknotes/#atom-tag</id><summary type="html">
    &lt;p&gt;Two weeks ago, the Stable Diffusion image generation model was &lt;a href="https://stability.ai/blog/stable-diffusion-public-release"&gt;released to the public&lt;/a&gt;. I wrote about this last week, in &lt;a href="https://simonwillison.net/2022/Aug/29/stable-diffusion/"&gt;Stable Diffusion is a really big deal&lt;/a&gt; - a post which has since become one of the top ten results for "stable diffusion" on Google and shown up in all sorts of different places online.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://waxy.org"&gt;Andy Baio&lt;/a&gt; pinged me a week ago on Friday and asked if I'd be interested in collaborating with him on digging into the training data. The Stable Diffusion &lt;a href="https://github.com/CompVis/stable-diffusion/blob/main/Stable_Diffusion_v1_Model_Card.md"&gt;Model Card&lt;/a&gt; provides a detailed description of how the model was trained - primarily on the &lt;a href="https://huggingface.co/datasets/laion/laion2B-en"&gt;LAION 2B-en&lt;/a&gt;) dataset (a subset of &lt;a href="https://laion.ai/blog/laion-5b/"&gt;LAION 5B&lt;/a&gt;), with further emphasis given to images with higher calculated aesthetic scores.&lt;/p&gt;
&lt;p&gt;We ended up deciding to dig into the &lt;a href="https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus"&gt;improved_aesthetics_6plus&lt;/a&gt; subset, which consists of 12 million images with an aesthetics score of 6 or higher.&lt;/p&gt;
&lt;p&gt;This isn't the full training set used for the model, but it's small enough that it fits comfortably in a SQLite database on inexpensive hosting...&lt;/p&gt;
&lt;p&gt;So I built a search engine, powered by &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update, 20th December 2023:&lt;/strong&gt; This search tool is no longer available.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;You can search for images by keyword using the following interface:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/images"&gt;laion-aesthetic.datasette.io/laion-aesthetic-6pls/images&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/laion-lemur.jpg" alt="Screenshot of the search interface, showing the results for lemur" style="max-width: 100%" /&gt;&lt;/p&gt;

&lt;p&gt;Or see a breakdown of image counts by the domain they were scraped from &lt;a href="https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/domain?_sort_desc=image_counts"&gt;on this page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The search engine provides access to 12,096,835 rows, and uses SQLite full-text search to power search across their text descriptions.&lt;/p&gt;
&lt;p&gt;Andy used this Datasette instance to conduct a thorough analysis of the underlying training data, which he wrote about in &lt;a href="https://waxy.org/2022/08/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator/"&gt;Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This analysis has had a really huge impact! Stories mentioning it made the front page of the websites of both the New York Times and the Washington Post on the same day:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.washingtonpost.com/technology/2022/09/02/midjourney-artificial-intelligence-state-fair-colorado/"&gt;He used AI art from Midjourney to win a fine-arts prize. Did he cheat?&lt;/a&gt; - The Washington Post&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-artists.html"&gt;An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy.&lt;/a&gt; - The New York Times&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Further afield, we spotted coverage from publications that included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.spiegel.de/netzwelt/web/stable-diffusion-verstoerend-gute-ki-kunst-und-jeder-kann-mitmischen-a-0bde407d-c0d5-474a-92fc-de2a99c01774"&gt;Diese Software macht Sie zum KI-Künstler&lt;/a&gt; - Der Spiegel. I get quoted in this one (a translated snippet from my blog at least).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://gigazine.net/news/20220831-exploring-stable-diffusions/"&gt;23億枚もの画像で構成された画像生成AI「Stable Diffusion」のデータセットのうち1200万枚がどこから入手した画像かを調査した結果が公開される&lt;/a&gt; - Gigazine, a long-running (22 years old) Japanese online news magazine.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="how-i-built-the-database"&gt;How I built the database&lt;/h4&gt;
&lt;p&gt;The code for the Datasette instance can be found in &lt;a href="https://github.com/simonw/laion-aesthetic-datasette"&gt;this GitHub repository&lt;/a&gt;. The &lt;a href="https://github.com/simonw/laion-aesthetic-datasette/issues"&gt;issues&lt;/a&gt; in that repo contain a detailed record of the various steps I took to build the database.&lt;/p&gt;
&lt;p&gt;The data subset I loaded into the search engine is &lt;a href="https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus"&gt;published on Hugging Face&lt;/a&gt; by Christoph Schuhmann. It consists of 7 parquet files, each of which are 325MB and stored in a GitHub repo using Git LFS.&lt;/p&gt;
&lt;p&gt;The first step was to fetch that data.&lt;/p&gt;
&lt;p&gt;This was my first time running &lt;code&gt;git lfs&lt;/code&gt; - I had to install it first using:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;brew install git-lfs
git lfs install
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I cloned the repo and fetched the data like this. Note that to make the actual files available in the directory you need to run both &lt;code&gt;git lfs fetch&lt;/code&gt; and &lt;code&gt;git lfs checkout&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;git clone https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; improved_aesthetics_6plus
git lfs fetch
git lfs checkout&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The result is 7 parquet files. I wanted to load these into SQLite.&lt;/p&gt;
&lt;p&gt;The first solution I found that worked was to use the &lt;a href="https://pypi.org/project/parquet-tools/"&gt; parquet-tools&lt;/a&gt; Python package:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pipx install parquet-tools
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I could then convert the parquet data to CSV like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;parquet-tools csv train-00002-of-00007-709151a2715d894d.parquet
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This outputs the contents of the file as CSV.&lt;/p&gt;
&lt;p&gt;Since this is a lot of data it made sense to create an empty SQLite table first (with columns with the correct column types) before inserting the data. I did that like so:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;sqlite3 laion-aesthetic-6pls.db &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;CREATE TABLE IF NOT EXISTS images (&lt;/span&gt;
&lt;span class="pl-s"&gt;   [url] TEXT,&lt;/span&gt;
&lt;span class="pl-s"&gt;   [text] TEXT,&lt;/span&gt;
&lt;span class="pl-s"&gt;   [width] INTEGER,&lt;/span&gt;
&lt;span class="pl-s"&gt;   [height] INTEGER,&lt;/span&gt;
&lt;span class="pl-s"&gt;   [similarity] FLOAT,&lt;/span&gt;
&lt;span class="pl-s"&gt;   [punsafe] FLOAT,&lt;/span&gt;
&lt;span class="pl-s"&gt;   [pwatermark] FLOAT,&lt;/span&gt;
&lt;span class="pl-s"&gt;   [aesthetic] FLOAT,&lt;/span&gt;
&lt;span class="pl-s"&gt;   [hash] TEXT,&lt;/span&gt;
&lt;span class="pl-s"&gt;   [__index_level_0__] INTEGER&lt;/span&gt;
&lt;span class="pl-s"&gt;);&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I used a &lt;code&gt;bash&lt;/code&gt; loop to insert all of the data:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-smi"&gt;filename&lt;/span&gt; &lt;span class="pl-k"&gt;in&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt;.parquet&lt;span class="pl-k"&gt;;&lt;/span&gt; &lt;span class="pl-k"&gt;do&lt;/span&gt;
    parquet-tools csv &lt;span class="pl-smi"&gt;$filename&lt;/span&gt; &lt;span class="pl-k"&gt;|&lt;/span&gt; sqlite3 -csv laion-aesthetic-6pls.db &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;.import --skip 1 '|cat -' images&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;done&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This uses the &lt;code&gt;sqlite3&lt;/code&gt; tool's &lt;code&gt;.import&lt;/code&gt; mechanism, because it's &lt;a href="https://til.simonwillison.net/sqlite/import-csv"&gt;really fast&lt;/a&gt;. The &lt;code&gt;--skip 1&lt;/code&gt; option is necessary to skip the first line, which is the CSV column names. The &lt;code&gt;'|cat -'&lt;/code&gt; is the idiom used to tell SQLite to read from standard input.&lt;/p&gt;
&lt;p&gt;This did the job! The result was a SQLite database file, about 3.5GB in size.&lt;/p&gt;
&lt;h4&gt;Enabling search&lt;/h4&gt;
&lt;p&gt;To enable SQLite full-text search against the images, I used &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#configuring-full-text-search"&gt;sqlite-utils enable-fts&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sqlite-utils enable-fts laion-aesthetic-6pls.db images text
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This took about a minute and a half to run. The resulting database file was around 3.9GB in size - the full text index didn't add as much to the file size as I had expected.&lt;/p&gt;
&lt;p&gt;Best of all, the search was fast! Most search queries took in the order of 20ms to run. My opinion of SQLite FTS keeps improving the more I use it.&lt;/p&gt;
&lt;h4&gt;Extracting domains with sqlite-utils --functions&lt;/h4&gt;
&lt;p&gt;We knew we wanted to count how many images had been scraped from each domain - but we currently only had the full image URLs:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;https://cdn.idahopotato.com/cache/4075b86c99bc2c46f927f3be5949d161_w310.jpg&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;While walking &lt;a href="https://twitter.com/cleopaws"&gt;Cleo&lt;/a&gt; I had an idea: what if &lt;code&gt;sqlite-utils&lt;/code&gt; made it really easy to register custom SQL functions and use them from the command-line? Then I could use a Python function to extract the domain names.&lt;/p&gt;
&lt;p&gt;This became the impetus for releasing &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-29"&gt;sqlite-utils 3.29&lt;/a&gt; with a brand new feature: &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#cli-query-functions"&gt;sqlite-utils --functions&lt;/a&gt;, which lets you do exactly that.&lt;/p&gt;
&lt;p&gt;Here's how I used that to extract the domain names from the URLs:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; First, add an empty 'domain' column to the table&lt;/span&gt;
sqlite-utils add-column data.db images domain

&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Now populate it using a custom SQL function:&lt;/span&gt;
sqlite-utils laion-aesthetic-6pls.db &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;update images set domain = domain(url)&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; \
--functions &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;from urllib.parse import urlparse&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;def domain(url):&lt;/span&gt;
&lt;span class="pl-s"&gt;    return urlparse(url).netloc&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here we are executing this SQL query against the database:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;update&lt;/span&gt; images &lt;span class="pl-k"&gt;set&lt;/span&gt; domain &lt;span class="pl-k"&gt;=&lt;/span&gt; domain(url)&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Where that &lt;code&gt;domain(url)&lt;/code&gt; function is defined in the Python snippet passed to the &lt;code&gt;--functions&lt;/code&gt; option:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;urllib&lt;/span&gt;.&lt;span class="pl-s1"&gt;parse&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;urlparse&lt;/span&gt;

&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;domain&lt;/span&gt;(&lt;span class="pl-s1"&gt;url&lt;/span&gt;):
    &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-en"&gt;urlparse&lt;/span&gt;(&lt;span class="pl-s1"&gt;url&lt;/span&gt;).&lt;span class="pl-s1"&gt;netloc&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;sqlite-utils&lt;/code&gt; runs &lt;code&gt;eval()&lt;/code&gt; against the code in that block, then loops through any callable objects defined by that code (skipping them if their name starts with an underscore) and registers those as custom SQL functions with SQLite.&lt;/p&gt;
&lt;p&gt;I'm really excited about this pattern. I think it makes &lt;code&gt;sqlite-utils&lt;/code&gt; an even more useful tool for running ad-hoc data cleanup and enrichment tasks.&lt;/p&gt;
&lt;h4&gt;Populating the domains table&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;domain&lt;/code&gt; column in the &lt;code&gt;images&lt;/code&gt; table was now populated, but it was a bit of a verbose column: it duplicated a chunk of text from the existing &lt;code&gt;url&lt;/code&gt;, and was repeated for over 12 million rows.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#extracting-columns-into-a-separate-table"&gt;sqlite-utils extract&lt;/a&gt; command is designed for this exact use-case. It can extract a column from an existing table out into a separate lookup table, reducing the database size by swapping those duplicate text fields for a much smaller integer foreign key column instead.&lt;/p&gt;
&lt;p&gt;I ran that like so:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;sqlite-utils extract laion-aesthetic-6pls.db images domain&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The result was a new &lt;code&gt;domains&lt;/code&gt; table, and a &lt;code&gt;domain_id&lt;/code&gt; column in the &lt;code&gt;images&lt;/code&gt; table that pointed to records there.&lt;/p&gt;
&lt;p&gt;One more step: I didn't want people visiting the site to have to run an expensive group by/count query to see which domains had the most images. So I denormalized that data into the &lt;code&gt;domains&lt;/code&gt; table.&lt;/p&gt;
&lt;p&gt;First I added a new integer column to it, called &lt;code&gt;image_counts&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;sqlite-utils add-column laion-aesthetic-6pls.db domain image_counts integer&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I populated it with a query like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;sqlite&lt;span class="pl-k"&gt;-&lt;/span&gt;utils laion&lt;span class="pl-k"&gt;-&lt;/span&gt;aesthetic&lt;span class="pl-k"&gt;-&lt;/span&gt;&lt;span class="pl-c1"&gt;6pls&lt;/span&gt;.&lt;span class="pl-c1"&gt;db&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;with counts as (&lt;/span&gt;
&lt;span class="pl-s"&gt;  select domain_id, count(*) as c from images group by domain_id&lt;/span&gt;
&lt;span class="pl-s"&gt;)&lt;/span&gt;
&lt;span class="pl-s"&gt;update domain&lt;/span&gt;
&lt;span class="pl-s"&gt;  set image_counts = counts.c&lt;/span&gt;
&lt;span class="pl-s"&gt;  from counts&lt;/span&gt;
&lt;span class="pl-s"&gt;  where id = counts.domain_id&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I first learned to combine CTEs and SQL updates while working &lt;a href="https://til.simonwillison.net/django/migration-using-cte"&gt;with Django migrations&lt;/a&gt; - I was delighted to see the same trick works for SQLite as well.&lt;/p&gt;
&lt;p&gt;You can see the result of this query in the &lt;a href="https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/domain?_sort_desc=image_counts"&gt;domain&lt;/a&gt; table. The first five rows look like this:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;domain&lt;/th&gt;
&lt;th&gt;image_counts&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;i.pinimg.com&lt;/td&gt;
&lt;td&gt;1043949&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;render.fineartamerica.com&lt;/td&gt;
&lt;td&gt;601106&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;us.123rf.com&lt;/td&gt;
&lt;td&gt;497244&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;cdn.shopify.com&lt;/td&gt;
&lt;td&gt;241632&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;136&lt;/td&gt;
&lt;td&gt;photos.smugmug.com&lt;/td&gt;
&lt;td&gt;225582&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4&gt;Doing the same for celebrities, artists, characters&lt;/h4&gt;
&lt;p&gt;We also wanted to provide pre-calculated counts for searches against a number of celebrities, artists and fictional characters - to help give a sense of the kinds of images that were included in the data.&lt;/p&gt;
&lt;p&gt;Andy gathered the ones we wanted to track in &lt;a href="https://docs.google.com/spreadsheets/d/1JLQQ3U6P0d4vDkAGuB8avmXOPIDPeDUdAEZsWTRYpng/edit"&gt;this Google Sheet&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I recently learned how to use the &lt;code&gt;/export?format=csv&lt;/code&gt; endpoint to export a Google Sheet as CSV. I found out that you can use &lt;code&gt;/export?format=csv&amp;amp;gid=1037423923&lt;/code&gt; to target a specific tab in a multi-tabbed sheet.&lt;/p&gt;
&lt;p&gt;So I imported Andy's data into SQLite using the following:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;curl -L &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://docs.google.com/spreadsheets/d/1JLQQ3U6P0d4vDkAGuB8avmXOPIDPeDUdAEZsWTRYpng/export?format=csv&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; \
  &lt;span class="pl-k"&gt;|&lt;/span&gt; sqlite-utils insert laion-aesthetic-6pls.db artists - --csv
curl -L &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://docs.google.com/spreadsheets/d/1JLQQ3U6P0d4vDkAGuB8avmXOPIDPeDUdAEZsWTRYpng/export?format=csv&amp;amp;gid=1037423923&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; \
  &lt;span class="pl-k"&gt;|&lt;/span&gt; sqlite-utils insert laion-aesthetic-6pls.db celebrities - --csv
curl -L &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://docs.google.com/spreadsheets/d/1JLQQ3U6P0d4vDkAGuB8avmXOPIDPeDUdAEZsWTRYpng/export?format=csv&amp;amp;gid=480391249&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; \
  &lt;span class="pl-k"&gt;|&lt;/span&gt; sqlite-utils insert laion-aesthetic-6pls.db characters - --csv&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This gave me &lt;code&gt;artists&lt;/code&gt;, &lt;code&gt;celebrities&lt;/code&gt; and &lt;code&gt;characters&lt;/code&gt; tables.&lt;/p&gt;
&lt;p&gt;The next challenge was to run a search query for each row in each of those tables and return the count of results. After some experimentation I found that this one worked:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; name, (
  &lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-c1"&gt;count&lt;/span&gt;(&lt;span class="pl-k"&gt;*&lt;/span&gt;) &lt;span class="pl-k"&gt;from&lt;/span&gt; images_fts &lt;span class="pl-k"&gt;where&lt;/span&gt; images_fts match &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;"&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; name &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;"&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
) &lt;span class="pl-k"&gt;as&lt;/span&gt; search_count &lt;span class="pl-k"&gt;from&lt;/span&gt; celebrities &lt;span class="pl-k"&gt;order by&lt;/span&gt; search_count &lt;span class="pl-k"&gt;desc&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note the &lt;code&gt;match '"' || name || '"'&lt;/code&gt; part - this was necessary to ensure the name was correctly quoted in a way that would avoid names like &lt;code&gt;Dwayne 'The Rock' Johnson&lt;/code&gt; from breaking the search query.&lt;/p&gt;
&lt;p&gt;Now that I had the query I could use that same CTE update trick to populate a counts column in the tables:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;sqlite-utils add-column laion-aesthetic-6pls.db celebrities image_counts integer

sqlite-utils laion-aesthetic-6pls.db &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;$(&lt;/span&gt;cat &lt;span class="pl-s"&gt;&lt;span class="pl-k"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pl-k"&gt;EOF&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;with counts as (&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;  select name,&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;    (&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;      select count(*) from images_fts where images_fts match '"' || name || '"'&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;    ) as search_count&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;    from celebrities&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;update celebrities&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;  set image_counts = counts.search_count&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;  from counts&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;  where celebrities.name = counts.name&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-k"&gt;EOF&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I'm using the &lt;code&gt;cat &amp;lt;&amp;lt;EOF&lt;/code&gt; trick here to avoid having to use shell escaping for the single and double quotes, as described in this TIL: &lt;a href="https://til.simonwillison.net/zsh/argument-heredoc"&gt;Passing command arguments using heredoc syntax&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here are the finished tables: &lt;a href="https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/characters"&gt;characters&lt;/a&gt;, &lt;a href="https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/celebrities"&gt;celebrities&lt;/a&gt;, &lt;a href="https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/artists"&gt;artists&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Deploying it to Fly&lt;/h4&gt;
&lt;p&gt;At just under 4GB the resulting SQLite database was an awkward size. I often deploy ~1GB databases to Google Cloud Run, but this was a bit too large for me to feel comfortable with that. Cloud Run can also get expensive for projects that attract a great deal of traffic.&lt;/p&gt;
&lt;p&gt;I decided to use &lt;a href="https://fly.io/"&gt;Fly&lt;/a&gt; instead. Fly includes support for mountable volumes, which means it's a great fit for these larger database files.&lt;/p&gt;
&lt;p&gt;I wrote about &lt;a href="https://simonwillison.net/2022/Feb/15/fly-volumes/"&gt;Using SQLite and Datasette with Fly Volumes&lt;/a&gt; back in February, when I added support to volumes to the &lt;a href="https://datasette.io/plugins/datasette-publish-fly"&gt;datasette-publish-fly&lt;/a&gt; Datasette plugin.&lt;/p&gt;
&lt;p&gt;This was still the largest database I had ever deployed to Fly, and it took a little bit of work to figure out the best way to handle it.&lt;/p&gt;
&lt;p&gt;In the end, I used the following recipe:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;datasette publish fly \
  --app laion-aesthetic \
  --volume-name datasette \
  --install datasette-json-html \
  --extra-options &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;-i /data/laion-aesthetic-6pls.db --inspect-file /data/inspect.json --setting sql_time_limit_ms 10000 --setting suggest_facets 0 --setting allow_download 0&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  -m metadata.yml&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The first time I ran this I used &lt;code&gt;--create-volume 20&lt;/code&gt; to create a 20GB volume called &lt;code&gt;datasette&lt;/code&gt;. I over-provisioned this so I could run commands like &lt;code&gt;sqlite-utils vacuum&lt;/code&gt;, which need twice the amount of space as is taken up by the database file itself.&lt;/p&gt;
&lt;p&gt;I uploaded the database file itself &lt;a href="https://til.simonwillison.net/fly/scp"&gt;using scp&lt;/a&gt;, and ran &lt;code&gt;fly ssh console -a laion-aesthetic&lt;/code&gt; to SSH in and execute other commands such as &lt;code&gt;datasette inspect  laion-aesthetic-6pls.db &gt; inspect.json&lt;/code&gt; to create the inspect JSON file.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--extra-options&lt;/code&gt; deserve explanation.&lt;/p&gt;
&lt;p&gt;Normally when you run &lt;code&gt;datasette publish&lt;/code&gt; the file you pass to the command is automatically deployed using &lt;a href="https://docs.datasette.io/en/stable/performance.html?highlight=immutable#immutable-mode"&gt;immutable mode&lt;/a&gt;. This mode is specifically designed for running read-only databases, and uses optimizations like only counting the rows in the table once on startup (or loading the counts from a pre-prepared &lt;code&gt;inspect.json&lt;/code&gt; file).&lt;/p&gt;
&lt;p&gt;I wanted those optimizations for this project. But &lt;code&gt;datasette publish fly&lt;/code&gt; is currently designed with the assumption that any databases you put in the &lt;code&gt;/data&lt;/code&gt; volume are designed to accept writes, and hence shouldn't be opened in immutable mode.&lt;/p&gt;
&lt;p&gt;I ended up coming up with a horrible hack. I add &lt;code&gt;-i /data/laion-aesthetic-6pls.db&lt;/code&gt; to the &lt;code&gt;--extra-options&lt;/code&gt; command to tell Datasette to open the file in immutable mode.&lt;/p&gt;
&lt;p&gt;But this wasn't enough! &lt;code&gt;datasette publish fly&lt;/code&gt; also configures Datasette to automatically open any databases in &lt;code&gt;/data&lt;/code&gt; in read-only mode, so that newly saved database files will be served correctly.&lt;/p&gt;
&lt;p&gt;This meant my instance was loading the same database twice - once in read-only mode and once in immutable mode.&lt;/p&gt;
&lt;p&gt;Rather than fixing the design of &lt;code&gt;datasette-publish-fly&lt;/code&gt;, I went for a cheap workaround. I start Datasette with the following &lt;code&gt;metadata.yml&lt;/code&gt; configuration (simplified):&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-ent"&gt;databases&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;laion-aesthetic-6pls&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;tables&lt;/span&gt;:
      &lt;span class="pl-ent"&gt;domain&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;label_column&lt;/span&gt;: &lt;span class="pl-s"&gt;domain&lt;/span&gt;
  &lt;span class="pl-ent"&gt;laion-aesthetic-6pls_2&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;allow&lt;/span&gt;: &lt;span class="pl-c1"&gt;false&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This ensures that the &lt;code&gt;laion-aesthetic-6pls&lt;/code&gt; database - the immutable one - is served correctly, and has a label column set for the &lt;code&gt;domain&lt;/code&gt; table too.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;laion-aesthetic-6pls_2&lt;/code&gt; is the second copy of that database, loaded because Datasette spotted it in the &lt;code&gt;/data&lt;/code&gt; directory. Setting &lt;code&gt;allow: false&lt;/code&gt; on it uses Datasette's &lt;a href="https://docs.datasette.io/en/stable/authentication.html#defining-permissions-with-allow-blocks"&gt;permissions framework&lt;/a&gt; to hide that duplicate database from view.&lt;/p&gt;
&lt;p&gt;I'm not proud of these workarounds, and I hope to fix them in the future - but for the moment this is what it took to deploy the project.&lt;/p&gt;
&lt;h4&gt;Scaling it to meet demand&lt;/h4&gt;
&lt;p&gt;I launched the first version of the application on Fly's cheapest instance - 256MB of RAM, costing $1.87/month.&lt;/p&gt;
&lt;p&gt;This worked fine when it was just me and Andy playing with the site, but it started to struggle as traffic started to increase.&lt;/p&gt;
&lt;p&gt;Fly have a "scale app" button which lets you upgrade your instance. I hadn't actually used it before, but I was delighted to find that it worked exactly as expected: I bumped the RAM up to 4GB (not coincidentally the size of the SQLite database file) and the instance restarted within a few seconds with upgraded capacity.&lt;/p&gt;
&lt;p&gt;Fly provide a preconfigured Grafana interface for watching your instances, and it helped me feel confident that the resized instance was happily dealing with the traffic.&lt;/p&gt;
&lt;p&gt;I plan to dial back down to a cheaper instance once interest in the project starts to fade.&lt;/p&gt;
&lt;h4&gt;Got a problem? Throw a search engine at it&lt;/h4&gt;
&lt;p&gt;This is the third time I've used Datasette to build a search engine in the past three weeks! My other two recent projects are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://scotrail.datasette.io/"&gt;scotrail.datasette.io&lt;/a&gt;, described in &lt;a href="https://simonwillison.net/2022/Aug/21/scotrail/"&gt;Analyzing ScotRail audio announcements with Datasette—from prototype to production&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://archive.sfmicrosociety.org"&gt;archive.sfmicrosociety.org&lt;/a&gt;, described in &lt;a href="https://simonwillison.net/2022/Aug/25/sfms-archive/"&gt;Building a searchable archive for the San Francisco Microscopical Society&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The ability to spin up a full search engine for anything that you can stuff into a SQLite database table (which it turns out is almost everything) is a really powerful ability. I plan to write a &lt;a href="https://datasette.io/tutorials"&gt;Datasette tutorial&lt;/a&gt; about this in the future.&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-render-image-tags"&gt;datasette-render-image-tags&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-render-image-tags/releases/tag/0.1"&gt;0.1&lt;/a&gt; - 2022-09-04
&lt;br /&gt;Turn any URLs ending in .jpg/.png/.gif into img tags with width 200&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-sitemap"&gt;datasette-sitemap&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-sitemap/releases/tag/1.0"&gt;1.0&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-sitemap/releases"&gt;3 releases total&lt;/a&gt;) - 2022-08-30
&lt;br /&gt;Generate sitemap.xml for Datasette sites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-block-robots"&gt;datasette-block-robots&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-block-robots/releases/tag/1.1"&gt;1.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-block-robots/releases"&gt;6 releases total&lt;/a&gt;) - 2022-08-30
&lt;br /&gt;Datasette plugin that blocks robots and crawlers using robots.txt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.29"&gt;3.29&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;103 releases total&lt;/a&gt;) - 2022-08-28
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sqlite/vacum-disk-full"&gt;SQLite VACUUM: database or disk is full&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/fly/scp"&gt;How to scp files to and from Fly&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/twitter/birdwatch-sqlite"&gt;Loading Twitter Birdwatch into SQLite for analysis with Datasette&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/andy-baio"&gt;andy-baio&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ethics"&gt;ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parquet"&gt;parquet&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/stable-diffusion"&gt;stable-diffusion&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/laion"&gt;laion&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/text-to-image"&gt;text-to-image&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="andy-baio"/><category term="ethics"/><category term="search"/><category term="ai"/><category term="datasette"/><category term="parquet"/><category term="weeknotes"/><category term="fly"/><category term="sqlite-utils"/><category term="stable-diffusion"/><category term="generative-ai"/><category term="laion"/><category term="training-data"/><category term="text-to-image"/><category term="ai-ethics"/></entry><entry><title>Digitizing 55,000 pages of civic meetings</title><link href="https://simonwillison.net/2022/Aug/22/digitizing-55000-pages-of-civic-meetings/#atom-tag" rel="alternate"/><published>2022-08-22T16:26:04+00:00</published><updated>2022-08-22T16:26:04+00:00</updated><id>https://simonwillison.net/2022/Aug/22/digitizing-55000-pages-of-civic-meetings/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://phildini.dev/digitizing-55-000-pages-of-civic-meetings"&gt;Digitizing 55,000 pages of civic meetings&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Philip James has been building public, searchable archives of city council meetings for various cities—Oakland and Alamedia so far—using my s3-ocr script to run Textract OCR against the PDFs of the minutes, and deploying them to Fly using Datasette. This is a really cool project, and very much the kind of thing I’ve been hoping to support with the tools I’ve been building.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/phildini/status/1561745329971613696"&gt;@phildini&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/archiving"&gt;archiving&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ocr"&gt;ocr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/political-hacking"&gt;political-hacking&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;



</summary><category term="archiving"/><category term="ocr"/><category term="political-hacking"/><category term="datasette"/><category term="fly"/></entry><entry><title>Litestream backups for Datasette Cloud (and weeknotes)</title><link href="https://simonwillison.net/2022/Aug/11/litestream-backups/#atom-tag" rel="alternate"/><published>2022-08-11T17:19:52+00:00</published><updated>2022-08-11T17:19:52+00:00</updated><id>https://simonwillison.net/2022/Aug/11/litestream-backups/#atom-tag</id><summary type="html">
    &lt;p&gt;My main focus this week has been adding robust backups to the forthcoming Datasette Cloud.&lt;/p&gt;
&lt;p&gt;Datasette Cloud is a SaaS service for &lt;a href="https://datasette.io"&gt;Datasette&lt;/a&gt;. It allows people to create a private Datasette instance where they can upload data, visualize and transform it and share it with other members of their team. You can join the waiting list to try it out using &lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSeMRYHBHXlWGDkjCqhAOinDrKEVwqNR5GfPs3iEXn8LzDT2Qg/viewform?embedded=true"&gt;this form&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I'm building Datastte Cloud on &lt;a href="https://fly.io/"&gt;Fly&lt;/a&gt;, specifically on &lt;a href="https://fly.io/blog/fly-machines/"&gt;Fly Machines&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Security is a big concern for Datasette Cloud. Teams should only be able to access their own data - bugs where users accidentally (or maliciously) access data for another team should be protected against as much as possible.&lt;/p&gt;
&lt;p&gt;To help guarantee that, I've designed Datasette Cloud so that each team gets their own, dedicated instance, running in a &lt;a href="https://firecracker-microvm.github.io/"&gt;Firecracker VM&lt;/a&gt; managed by Fly. Their data lives in a dedicated volume.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://fly.io/docs/reference/volumes/"&gt;Fly volumes&lt;/a&gt; already implement snapshot backups, but I'm interested in defence in depth. This is where &lt;a href="https://litestream.io/"&gt;Litestream&lt;/a&gt; comes in (coincidentally now &lt;a href="https://fly.io/blog/all-in-on-sqlite-litestream/"&gt;part of Fly&lt;/a&gt;, although it wasn't when I first selected it as my backup strategy).&lt;/p&gt;
&lt;p&gt;I'm using Litestream to constantly backup the data for each Datasette Cloud team to an S3 bucket. In the case of a complete failure of a volume, I can restore data from a backup that should be at most a few seconds out of date. Litestream also gives me point-in-time backups, such that I can recover a previous version of the data within a configurable retention window.&lt;/p&gt;
&lt;h4&gt;Keeping backups isolated&lt;/h4&gt;
&lt;p&gt;Litestream &lt;a href="https://litestream.io/how-it-works/"&gt;works&lt;/a&gt; by writing a constant stream of pages from SQLite's WAL (Write-Ahead Log) up to an S3 bucket. It needs the ability to both read and write from S3.&lt;/p&gt;
&lt;p&gt;This requires making S3 credentials available within the containers that run Datasette and Litestream for each team account.&lt;/p&gt;
&lt;p&gt;Credentials in those containers are not visible to the users of the software, but I still wanted to be confident that if the credentials leaked in some way the isolation between teams would be maintained.&lt;/p&gt;
&lt;p&gt;Initially I thought about having a separate S3 bucket for each team, but it turns out AWS has a default limit of 100 buckets per account, and a hard limit of 1,000. I aspire to have more than 1,000 customers, so this limit makes a bucket-per-team seem like the wrong solution.&lt;/p&gt;
&lt;p&gt;I've learned an absolute ton about S3 and AWS permissions building my &lt;a href="https://github.com/simonw/s3-credentials"&gt;s3-credentials&lt;/a&gt; tool for creating credentials for accessing S3.&lt;/p&gt;
&lt;p&gt;One of the tricks I've learned is that it's possible to create temporary, time-limited credentials that &lt;a href="https://simonwillison.net/2022/Jan/18/weeknotes/"&gt;only work for a prefix&lt;/a&gt; (effectively a folder) within an S3 bucket.&lt;/p&gt;
&lt;p&gt;This means I can run Litestream with credentials that are specific to the team - that can read and write only from the &lt;code&gt;team-ID/&lt;/code&gt; prefix in the S3 bucket I am using to store the backups.&lt;/p&gt;
&lt;h4&gt;Obtaining temporary credentials&lt;/h4&gt;
&lt;p&gt;My &lt;code&gt;s3-credentials&lt;/code&gt; tool can create credentials for a prefix within an S3 bucket like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;s3-credentials create my-bucket-for-backus \
  --duration 12h \
  --prefix team-56/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This command uses the &lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sts.html#STS.Client.assume_role"&gt;sts.assume_role()&lt;/a&gt; AWS method to create credentials that allow access to that bucket, attaching &lt;a href="https://github.com/simonw/s3-credentials/blob/0.12.1/README.md#--prefix-my-prefix"&gt;this generated JSON policy&lt;/a&gt; to it in order to restrict access to the provided prefix.&lt;/p&gt;
&lt;p&gt;I extracted the relevant Python code from  &lt;code&gt;s3-credentials&lt;/code&gt; and used it to create a private API endpoint in my Datasette Cloud management server which could return the temporary credentials needed by the team container.&lt;/p&gt;
&lt;p&gt;With the endpoint in place, my code for launching a team container can do this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create the volume and machine for that team (if they do not yet exist)&lt;/li&gt;
&lt;li&gt;Generate a signed secret token that the machine container can exchange for its S3 credentials&lt;/li&gt;
&lt;li&gt;Launch the machine container, passing it the secret token&lt;/li&gt;
&lt;li&gt;On launch, the container runs a script which exchanges that secret token for its 12 hour S3 credentials, using the private API endpoint I created&lt;/li&gt;
&lt;li&gt;Those credentials are used to populate the &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;, &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; and &lt;code&gt;AWS_SESSION_TOKEN&lt;/code&gt; environment variables used by Litestream&lt;/li&gt;
&lt;li&gt;Start Litestream, which then starts Datasette&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Restarting every 12 hours&lt;/h4&gt;
&lt;p&gt;You may be wondering why I bothered with that initial secret token - why not just pass the temporary AWS credentials to the container when I launch it?&lt;/p&gt;
&lt;p&gt;The reason for this is that I need to be able to obtain fresh credentials every 12 hours.&lt;/p&gt;
&lt;p&gt;A really neat feature of Fly Machines is that they support scale-to-zero. You can stop them, and Fly will automatically restart them the next time they recieve traffic.&lt;/p&gt;
&lt;p&gt;All you need to do is call &lt;code&gt;sys.exit(0)&lt;/code&gt; in your Python code (or the equivalent in any other language) and Fly will stop your container... and then restart it again with a couple of seconds of cold start time the next time an HTTP request for your container hits the Fly router.&lt;/p&gt;
&lt;p&gt;So far I'm mainly using this to avoid the cost of running containers when they aren't actually in- use. But there's a neat benefit when it comes to Litestream too.&lt;/p&gt;
&lt;p&gt;I'm using S3 credentials which expire after 12 hours. This means I need to periodically refresh the credentials and restart Litestream or it will stop being able to write to the S3 bucket.&lt;/p&gt;
&lt;p&gt;After considering a few ways of doing this, I selected the simplest to implement: have Datasette call &lt;code&gt;sys.exit(0)&lt;/code&gt; after ten hours, and let Fly restart the container causing my startup script to fetch freshly generated 12 hour credentials and pass them to Litestream.&lt;/p&gt;
&lt;p&gt;I implemented this by adding it as a new setting to my existing &lt;a href="https://datasette.io/plugins/datasette-scale-to-zero"&gt;datasette-scale-to-zero&lt;/a&gt; plugin. You can now configure that with &lt;code&gt; "max-age": "10h"&lt;/code&gt; and it will shut down Datasette once the server has been running for that long.&lt;/p&gt;
&lt;p&gt;Why does this require my own secret token system? Because when the container is restarted, it needs to make an authenticated call to my endpoint to retrieve those fresh S3 credentials. Fly persists environment variable secrets between restarts to the container, so that secret can be long-lived even while it is exchanged for short-term S3 credentials.&lt;/p&gt;
&lt;p&gt;I only just put the new backup system in place, so I'm exercising it a bit before I open things up to trial users - but so far it's looking like a very robust solution to the problem.&lt;/p&gt;
&lt;h4&gt;s3-ocr improvements&lt;/h4&gt;
&lt;p&gt;I released a &lt;a href="https://datasette.io/tools/s3-ocr"&gt;few new versions&lt;/a&gt; of &lt;a href="https://simonwillison.net/2022/Jun/30/s3-ocr/"&gt;s3-ocr&lt;/a&gt; this week, as part of my ongoing project working with the San Francisco Microscopical Society team to release a searchable version of their scanned document archives.&lt;/p&gt;
&lt;p&gt;The two main improvements are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A new &lt;code&gt;--dry-run&lt;/code&gt; option to &lt;code&gt;s3-ocr start&lt;/code&gt; which shows you what the tool will do without making any changes to your S3 bucket, or triggering any OCR jobs. &lt;a href="https://github.com/simonw/s3-ocr/issues/22"&gt;#22&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;s3-ocr start&lt;/code&gt; used to fail with an error if running it would create more than 100 (or 600 depending on your region) concurrent OCR jobs. The tool now knows how to identify that error and pause and retry starting the jobs instead. &lt;a href="https://github.com/simonw/s3-ocr/issues/21"&gt;#21&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The fix that took the most time is this: installations of the tool no longer arbitrarily fail to work depending on the environment you install them into!&lt;/p&gt;
&lt;p&gt;Solving this took me the best part of a day. The short version is this: Click 8.1.0 introduced a new feature that lets you use &lt;code&gt;@cli.command&lt;/code&gt; as a decorator instead of &lt;code&gt;@cli.command()&lt;/code&gt;. This meant that installing &lt;code&gt;s3-ocr&lt;/code&gt; in an environment that already had a previous version of Click would result in silent errors.&lt;/p&gt;
&lt;p&gt;The solution is simple: pin to &lt;code&gt;click&amp;gt;=8.1.0&lt;/code&gt; in the project dependencies if you plan to use this new syntax.&lt;/p&gt;
&lt;p&gt;If I'd read the Click &lt;a href="https://click.palletsprojects.com/en/8.1.x/changes/#version-8-1-0"&gt;changelog more closely&lt;/a&gt; I would have saved myself a whole lot of time.&lt;/p&gt;
&lt;p&gt;Issues &lt;a href="https://github.com/simonw/s3-ocr/issues/25"&gt;#25&lt;/a&gt; and &lt;a href="https://github.com/simonw/s3-ocr/issues/26"&gt;#26&lt;/a&gt; detail the many false turns I took trying to figure this out.&lt;/p&gt;
&lt;h4&gt;More fun with GPT-3 and DALL-E&lt;/h4&gt;
&lt;p&gt;This tweet scored over a million impressions on Twitter:&lt;/p&gt;

&lt;blockquote class="twitter-tweet"&gt;&lt;p lang="en" dir="ltr"&gt;New hobby: prototyping video games in 60 seconds using a combination of GPT-3 and DALL-E&lt;br /&gt;&lt;br /&gt;Here&amp;#39;s &amp;quot;Raccoon Heist&amp;quot; &lt;a href="https://t.co/xQ3Vm8p2XW"&gt;pic.twitter.com/xQ3Vm8p2XW&lt;/a&gt;&lt;/p&gt;- Simon Willison (@simonw) &lt;a href="https://twitter.com/simonw/status/1555626060384911360?ref_src=twsrc%5Etfw"&gt;August 5, 2022&lt;/a&gt;&lt;/blockquote&gt;

&lt;p&gt;As this got retweeted outside of my usual circles it started confusing people who thought the "prototype" was a working game, as opposed to a fake screenshot and a paragraph of descriptive text! I wasn't kidding when I said I spent 60 seconds on this.&lt;/p&gt;
&lt;p&gt;I also figured out how to use GPT-3 to write &lt;code&gt;jq&lt;/code&gt; one-liners. I love &lt;code&gt;jq&lt;/code&gt; but I have to look up how to use it every time, so having GPT-3 do the work for me is a pretty neat time saver. More on that in this TIL: &lt;a href="https://til.simonwillison.net/gpt3/jq"&gt;Using GPT-3 to figure out jq recipes&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/s3-ocr"&gt;s3-ocr&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/s3-ocr/releases/tag/0.6.3"&gt;0.6.3&lt;/a&gt; - (&lt;a href="https://github.com/simonw/s3-ocr/releases"&gt;9 releases total&lt;/a&gt;) - 2022-08-10
&lt;br /&gt;Tools for running OCR against files stored in S3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-scale-to-zero"&gt;datasette-scale-to-zero&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-scale-to-zero/releases/tag/0.2"&gt;0.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-scale-to-zero/releases"&gt;4 releases total&lt;/a&gt;) - 2022-08-05
&lt;br /&gt;Quit Datasette if it has not received traffic for a specified time period&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/shot-scraper/releases/tag/0.14.3"&gt;0.14.3&lt;/a&gt; - (&lt;a href="https://github.com/simonw/shot-scraper/releases"&gt;18 releases total&lt;/a&gt;) - 2022-08-02
&lt;br /&gt;A command-line utility for taking automated screenshots of websites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/s3-credentials"&gt;s3-credentials&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/s3-credentials/releases/tag/0.12.1"&gt;0.12.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/s3-credentials/releases"&gt;13 releases total&lt;/a&gt;) - 2022-08-01
&lt;br /&gt;A tool for creating credentials for accessing S3 buckets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-sqlite-fts4"&gt;datasette-sqlite-fts4&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-sqlite-fts4/releases/tag/0.3.2"&gt;0.3.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-sqlite-fts4/releases"&gt;2 releases total&lt;/a&gt;) - 2022-07-31&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sqlite/related-content"&gt;Related content with SQLite FTS and a Datasette template function&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/aws/boto-command-line"&gt;Using boto3 from the command line&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sqlite/trying-macos-extensions"&gt;Trying out SQLite extensions on macOS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/pytest/mocking-boto"&gt;Mocking a Textract LimitExceededException with boto&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/gpt3/jq"&gt;Using GPT-3 to figure out jq recipes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ocr"&gt;ocr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/s3"&gt;s3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/litestream"&gt;litestream&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-3"&gt;gpt-3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/dalle"&gt;dalle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ocr"/><category term="s3"/><category term="datasette"/><category term="weeknotes"/><category term="datasette-cloud"/><category term="fly"/><category term="litestream"/><category term="gpt-3"/><category term="dalle"/><category term="gpt"/></entry><entry><title>How SQLite Helps You Do ACID</title><link href="https://simonwillison.net/2022/Aug/10/sqlite-rollback-journal/#atom-tag" rel="alternate"/><published>2022-08-10T15:39:09+00:00</published><updated>2022-08-10T15:39:09+00:00</updated><id>https://simonwillison.net/2022/Aug/10/sqlite-rollback-journal/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fly.io/blog/sqlite-internals-rollback-journal/"&gt;How SQLite Helps You Do ACID&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Ben Johnson’s series of posts explaining the internals of SQLite continues with a deep look at how the rollback journal works. I’m learning SO much from this series.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/benbjohnson/status/1557377929583026181"&gt;@benbjohnson&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ben-johnson"&gt;ben-johnson&lt;/a&gt;&lt;/p&gt;



</summary><category term="sqlite"/><category term="fly"/><category term="ben-johnson"/></entry><entry><title>Quoting Thomas Ptacek</title><link href="https://simonwillison.net/2022/Jul/7/thomas-ptacek/#atom-tag" rel="alternate"/><published>2022-07-07T20:31:32+00:00</published><updated>2022-07-07T20:31:32+00:00</updated><id>https://simonwillison.net/2022/Jul/7/thomas-ptacek/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://fly.io/blog/soc2-the-screenshots-will-continue-until-security-improves/"&gt;&lt;p&gt;SOC2 is about the security of the company, not the company’s products. A SOC2 audit would tell you something about whether the customer support team could pop a shell on production machines; it wouldn’t tell you anything about whether an attacker could pop a shell with a SQL Injection vulnerability.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://fly.io/blog/soc2-the-screenshots-will-continue-until-security-improves/"&gt;Thomas Ptacek&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql-injection"&gt;sql-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;



</summary><category term="security"/><category term="sql-injection"/><category term="thomas-ptacek"/><category term="fly"/></entry><entry><title>Weeknotes: Datasette Cloud ready to preview</title><link href="https://simonwillison.net/2022/Jun/7/datasette-cloud-preview/#atom-tag" rel="alternate"/><published>2022-06-07T01:00:00+00:00</published><updated>2022-06-07T01:00:00+00:00</updated><id>https://simonwillison.net/2022/Jun/7/datasette-cloud-preview/#atom-tag</id><summary type="html">
    &lt;p&gt;I made an absolute ton of progress building Datasette Cloud on Fly this week, and also had a bunch of fun playing with GPT-3.&lt;/p&gt;
&lt;h4&gt;Datasette Cloud&lt;/h4&gt;
&lt;p&gt;Datasette Cloud is my upcoming hosted SaaS version of Datasette. I've been re-building my initial alpha on top of &lt;a href="https://fly.io/"&gt;Fly&lt;/a&gt; because I want to be able to provide each team account with their own Datasette instance running in a dedicated Firecracker container, and the recently announced &lt;a href="https://fly.io/blog/fly-machines/"&gt;Fly Machines&lt;/a&gt; lets me do exactly that.&lt;/p&gt;
&lt;p&gt;As-of this weekend I have all of the different pieces in place, and I'm starting to preview it to potential customers.&lt;/p&gt;
&lt;p&gt;Interested in trying it out? You can &lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSeMRYHBHXlWGDkjCqhAOinDrKEVwqNR5GfPs3iEXn8LzDT2Qg/viewform"&gt;request access to the preview here&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;GPT-3 explorations&lt;/h4&gt;
&lt;p&gt;Most of my GPT-3 explorations over the past week are covered by these two blog posts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2022/May/31/a-datasette-tutorial-written-by-gpt-3/"&gt;A Datasette tutorial written by GPT-3&lt;/a&gt; is the point at which I really started taking GPT-3 seriously, after convincing myself that I could use it to help with real work, not just as a form of entertainment.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2022/Jun/5/play-with-gpt3/"&gt;How to play with the GPT-3 language model&lt;/a&gt; is a very quick getting started tutorial, because I polled people on Twitter and found that more than half didn't know you could try GPT-3 out now for free.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Searching &lt;a href="https://twitter.com/search?q=gpt%20from%3Asimonw&amp;amp;src=typed_query&amp;amp;f=live"&gt;my tweets for GPT&lt;/a&gt; captures a bunch of other, smaller experiments. A few highlights:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://twitter.com/simonw/status/1532958805289410560"&gt;How to safely run two processes in one Docker container&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/simonw/status/1532391884277960705"&gt;"Write a book outline about Datasette in Spanish"&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/simonw/status/1533838306625982466"&gt;Getting GPT-3 to write yes-and improv scripts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/simonw/status/1532061347218763776"&gt;Generating and then explaining complex ffmpeg recipes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-upload-csvs"&gt;datasette-upload-csvs&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-upload-csvs/releases/tag/0.7"&gt;0.7&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-upload-csvs/releases"&gt;8 releases total&lt;/a&gt;) - 2022-06-03
&lt;br /&gt;Datasette plugin for uploading CSV files and converting them to database tables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/mbox-to-sqlite"&gt;mbox-to-sqlite&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/mbox-to-sqlite/releases/tag/0.1a0"&gt;0.1a0&lt;/a&gt; - 2022-05-31
&lt;br /&gt;Load email from .mbox files into SQLite&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-graphql"&gt;datasette-graphql&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-graphql/releases/tag/2.0.2"&gt;2.0.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-graphql/releases"&gt;34 releases total&lt;/a&gt;) - 2022-05-30
&lt;br /&gt;Datasette plugin providing an automatic GraphQL API for your SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-auth-existing-cookies"&gt;datasette-auth-existing-cookies&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-auth-existing-cookies/releases/tag/1.0a1"&gt;1.0a1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-auth-existing-cookies/releases"&gt;12 releases total&lt;/a&gt;) - 2022-05-29
&lt;br /&gt;Datasette plugin that authenticates users based on existing domain cookies&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/django/just-with-django"&gt;Using just with Django&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-3"&gt;gpt-3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="projects"/><category term="datasette"/><category term="weeknotes"/><category term="datasette-cloud"/><category term="fly"/><category term="gpt-3"/><category term="llms"/><category term="gpt"/></entry><entry><title>Weeknotes: Building Datasette Cloud on Fly Machines, Furo for documentation</title><link href="https://simonwillison.net/2022/May/26/weeknotes-building-datasette-cloud/#atom-tag" rel="alternate"/><published>2022-05-26T04:35:11+00:00</published><updated>2022-05-26T04:35:11+00:00</updated><id>https://simonwillison.net/2022/May/26/weeknotes-building-datasette-cloud/#atom-tag</id><summary type="html">
    &lt;p&gt;Hosting provider Fly released &lt;a href="https://fly.io/blog/fly-machines/"&gt;Fly Machines&lt;/a&gt; this week. I got an early preview and I've been working with it for a few days - it's a &lt;em&gt;fascinating&lt;/em&gt; new piece of technology. I'm using it to get my hosting service for Datasette ready for wider release.&lt;/p&gt;
&lt;h4&gt;Datasette Cloud&lt;/h4&gt;
&lt;p&gt;Datasette Cloud is the name I've given my forthcoming hosted SaaS version of Datasette. I'm building it for two reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;This is an obvious step towards building a sustainable business model for my open source project. It's a reasonably well-trodden path at this point: plenty of projects have demonstrated that offering paid hosting for an open source project can build a valuable business. &lt;a href="https://gitlab.com/"&gt;GitLab&lt;/a&gt; are an especially good example of this model.&lt;/li&gt;
&lt;li&gt;There are plenty of people who could benefit from Datasette, but the friction involved in hosting it prevents them from taking advantage of the software. I've tried to make it &lt;a href="https://docs.datasette.io/en/stable/deploying.html"&gt;as easy to host&lt;/a&gt; as possible, but without a SaaS hosted version I'm failing to deliver value to the people that I most want the software to help.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;My previous alpha was built directly on Docker, running everything on a single large VPS. Obviously it needed to scale beyond one machine, and I started experimenting with Kubernetes to make this happen.&lt;/p&gt;
&lt;p&gt;I also want to allow users to run their own plugins, without risk of malicious code causing problems for other accounts. Docker and Kubernetes containers don't offer the isolation that I need to feel comfortable doing this, so I started researching &lt;a href="https://firecracker-microvm.github.io/"&gt;Firecracker&lt;/a&gt; - constructed by AWS to power Lambda and Fargate, so very much designed with potentially malicious code in mind.&lt;/p&gt;
&lt;p&gt;Spinning up Firecracker on a Kubernetes cluster is no small lift!&lt;/p&gt;
&lt;p&gt;And then I heard about &lt;a href="https://fly.io/blog/fly-machines/"&gt;Fly Machines&lt;/a&gt;. And it looks like it's exactly what I need to get this project to the next milestone.&lt;/p&gt;
&lt;h4&gt;Fly Machines&lt;/h4&gt;
&lt;p&gt;Fly's core offering allows you to run Docker containers in regions around the world, compiled (automatically by Fly) to Firecracker containers with geo-load-balancing so users automatically get routed to an instance running near them.&lt;/p&gt;
&lt;p&gt;Their new Fly Machines product gives you a new way to run containers there: you get full control over when containers are created, updated, started, stopped and destroyed. It's the exact level of control I need to build Datasette Cloud.&lt;/p&gt;
&lt;p&gt;It also implements scale-to-zero: you can stop a container, and Fly will automatically start it back up again for you (generally in less than a second) when fresh traffic comes in.&lt;/p&gt;
&lt;p&gt;(I had built my own version of this for my Datasette Cloud alpha, but the spin up time took more like 10s and involved showing the user a custom progress bar to help them see what was going on.)&lt;/p&gt;
&lt;p&gt;Being able to programatically start and stop Firecracker containers was exactly what I'd been trying to piece together using Kubernetes - and the ability to control which global region they go in (with the potential for &lt;a href="https://tip.litestream.io/guides/read-replica/"&gt;Litestream replication&lt;/a&gt; between regions in the future) is a feature I hadn't expected to be able to offer for years.&lt;/p&gt;
&lt;p&gt;So I spent most of this week on a proof of concept. I've successfully demonstrated that the Fly Machines product has almost exactly the features that I need to ship Datasette Cloud on Fly Machines - and I've confirmed that the gaps I need to fill are on Fly's near-term roadmap.&lt;/p&gt;
&lt;p&gt;I don't have anything to demonstrate publicly just yet, but I do have &lt;a href="https://til.simonwillison.net/fly"&gt;several new TILs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If this sounds interesting to you or your organization and you'd like to try it out, drop me an email at &lt;code&gt;swillison&lt;/code&gt; @ Google's email service.&lt;/p&gt;
&lt;h4 id="furo-theme"&gt;The Furo theme for Sphinx&lt;/h4&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt; automated screenshot tool's README had got a little too long, so I decided to upgrade it to &lt;a href="https://shot-scraper.datasette.io/"&gt;a full documentation website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I chose to use &lt;a href="https://myst-parser.readthedocs.io/"&gt;MyST&lt;/a&gt; and &lt;a href="https://www.sphinx-doc.org/"&gt;Sphinx&lt;/a&gt; for this, hosted on &lt;a href="https://www.readthedocs.org/"&gt;Read The Docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;MyST adds Markdown syntax to Sphinx, which is easier to remember (and for people to contribute to) than reStructuredText.&lt;/p&gt;
&lt;p&gt;After putting the site live, Adam Johnson &lt;a href="https://twitter.com/AdamChainz/status/1527666472193081345"&gt;suggested&lt;/a&gt; I take a look at the &lt;a href="https://github.com/pradyunsg/furo"&gt;Furo theme&lt;/a&gt;. I'd previously found Sphinx themes hard to navigate because they had so much differing functionality, but a personal recommendation turned out to be exactly what I needed.&lt;/p&gt;
&lt;p&gt;Furo is really nice - it fixed a slight rendering complaint I had about nested lists in the theme I was using, and since it doesn't use web fonts it dropped the bytes transferred for a page of documentation by more than half!&lt;/p&gt;
&lt;p&gt;I switched &lt;code&gt;shot-scraper&lt;/code&gt; over to Furo, and liked it so much that I switched over &lt;a href="https://docs.datasette.io/en/latest/"&gt;Datasette&lt;/a&gt; and &lt;a href="https://sqlite-utils.datasette.io/en/latest/"&gt;sqlite-utils&lt;/a&gt; too.&lt;/p&gt;
&lt;p&gt;Here's what the &lt;code&gt;shot-scraper&lt;/code&gt; documentation looks like now:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/shot-scraper-docs.png" alt="A screenshot of the shot-scraper documentation, showing the table of contents" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Screenshot taken using &lt;code&gt;shot-scraper&lt;/code&gt; itself, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper \
  https://shot-scraper.datasette.io/en/latest/ \
  --retina --height 1200
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Full details of those theme migrations (including more comparative screenshots) can be found in these issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/simonw/shot-scraper/issues/77"&gt;shot-scraper: Switch to Furo theme #77&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/simonw/datasette/issues/1746"&gt;datasette: Switch documentation theme to Furo #1746&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/simonw/sqlite-utils/issues/435"&gt;sqlite-utils:  Switch to Furo documentation theme #435&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-unsafe-expose-env"&gt;datasette-unsafe-expose-env&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-unsafe-expose-env/releases/tag/0.1"&gt;0.1&lt;/a&gt; - 2022-05-25
&lt;br /&gt;Datasette plugin to expose some environment variables at /-/env for debugging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/shot-scraper/releases/tag/0.14.1"&gt;0.14.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/shot-scraper/releases"&gt;16 releases total&lt;/a&gt;) - 2022-05-22
&lt;br /&gt;A comand-line utility for taking automated screenshots of websites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/google-calendar-to-sqlite"&gt;google-calendar-to-sqlite&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/google-calendar-to-sqlite/releases/tag/0.1a0"&gt;0.1a0&lt;/a&gt; - 2022-05-21
&lt;br /&gt;Create a SQLite database containing your data from Google Calendar&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-upload-dbs"&gt;datasette-upload-dbs&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-upload-dbs/releases/tag/0.1.1"&gt;0.1.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-upload-dbs/releases"&gt;2 releases total&lt;/a&gt;) - 2022-05-17
&lt;br /&gt;Upload SQLite database files to Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-insert"&gt;datasette-insert&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-insert/releases/tag/0.7"&gt;0.7&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-insert/releases"&gt;7 releases total&lt;/a&gt;) - 2022-05-16
&lt;br /&gt;Datasette plugin for inserting and updating data&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/github-actions/job-summaries"&gt;GitHub Actions job summaries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/github-actions/oxipng"&gt;Optimizing PNGs in GitHub Actions using Oxipng&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/fly/fly-docker-registry"&gt;Using the Fly Docker registry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/fly/fly-logs-to-s3"&gt;Writing Fly logs to S3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/fly/wildcard-dns-ssl"&gt;Wildcard DNS and SSL on Fly&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/firecracker"&gt;firecracker&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="documentation"/><category term="projects"/><category term="datasette"/><category term="weeknotes"/><category term="datasette-cloud"/><category term="fly"/><category term="firecracker"/></entry><entry><title>Using SQLite and Datasette with Fly Volumes</title><link href="https://simonwillison.net/2022/Feb/15/fly-volumes/#atom-tag" rel="alternate"/><published>2022-02-15T02:17:28+00:00</published><updated>2022-02-15T02:17:28+00:00</updated><id>https://simonwillison.net/2022/Feb/15/fly-volumes/#atom-tag</id><summary type="html">
    &lt;p&gt;A few weeks ago, &lt;a href="https://fly.io/"&gt;Fly&lt;/a&gt; announced &lt;a href="https://fly.io/blog/free-postgres/"&gt;Free Postgres Databases&lt;/a&gt; as part of the free tier of their hosting product. Their announcement included this snippet:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The lede is "free Postgres" because that's what matters to full stack apps. You don't &lt;em&gt;have&lt;/em&gt; to use these for Postgres. If SQLite is more your jam, mount up to 3GB of volumes and use "free SQLite." Yeah, we're probably underselling that.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I don't know if I've ever been &lt;a href="https://xkcd.com/356/"&gt;nerd sniped&lt;/a&gt; so effectively (they &lt;a href="https://twitter.com/mrkurt/status/1484609372114272261"&gt;knew what they were doing&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;This has the potential to address one of the big challenges in deploying &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt;, my open source web framework for building dynamic applications on top of SQLite.&lt;/p&gt;
&lt;p&gt;Datasette currently has plenty of good answers for publishing read-only databases online - &lt;a href="https://docs.datasette.io/en/stable/publish.html#publishing-to-google-cloud-run"&gt;Cloud Run&lt;/a&gt;, &lt;a href="https://docs.datasette.io/en/stable/publish.html#publishing-to-heroku"&gt;Heroku&lt;/a&gt; and &lt;a href="https://docs.datasette.io/en/stable/publish.html#publishing-to-vercel"&gt;Vercel&lt;/a&gt; are three great options. But the moment you want to &lt;a href="https://simonwillison.net/2020/Feb/26/weeknotes-datasette-writes/"&gt;write back&lt;/a&gt; to that SQLite database you're in for a harder time.&lt;/p&gt;
&lt;p&gt;Accepting writes requires a working read-write filesystem, and that generally means spinning up a virtual machine - and having to take responsibility for keeping a full Linux instance fed and updated. That's a much bigger commitment than deploying a container to something like Cloud Run.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://fly.io/docs/reference/volumes/"&gt;Fly Volumes&lt;/a&gt; are exactly what I've been waiting for here. They provide persistent storage for Fly apps - a volume you can mount as a directory inside a containerized application that will persist data through subsequent deploys.&lt;/p&gt;
&lt;p&gt;Add a free tier, and I finally have something I can point people towards when they want to try building something persistent on top of Datasette without taking on the burden of maintaining their own virtual server somewhere, or committing to spend money before they've evaluated if this is going to work for them or not.&lt;/p&gt;
&lt;h4&gt;datasette-publish-fly&lt;/h4&gt;
&lt;p&gt;I built the first version of the &lt;a href="https://datasette.io/plugins/datasette-publish-fly"&gt;datasette-publish-fly&lt;/a&gt; plugin nearly two years ago, as a tool for publishing read-only databases to Fly in a similar way to Cloud Run or Vercel.&lt;/p&gt;
&lt;p&gt;(That plugin actually broke a few months ago without me noticing due to some changes made to the &lt;code&gt;flyctl&lt;/code&gt; utility that it shells out to - figuring out what had happened and wanting to avoid surprises like this in the future lead me to develop the &lt;a href="https://simonwillison.net/2022/Feb/2/help-scraping/"&gt;Help scraping&lt;/a&gt; technique I described last week.)&lt;/p&gt;
&lt;p&gt;This week I released &lt;a href="https://github.com/simonw/datasette-publish-fly/releases/tag/1.1"&gt;datasette-publish-fly 1.1&lt;/a&gt; with the option to attach volumes to the published instances and start serving mutable, persistent SQLite databases!&lt;/p&gt;
&lt;p&gt;Basic usage looks like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;datasette publish fly \
--app my-new-fly-application \
--create-volume 1 \
--create-db my-new-db&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will create a new Fly application on your account called &lt;code&gt;my-new-fly-application&lt;/code&gt;, create a 1GB volume for that application, mount the volume and then create an empty SQLite database file in that volume called &lt;code&gt;my-new-db.db&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A blank database file isn't very interesting! Datasette's &lt;a href="https://docs.datasette.io/en/stable/sql_queries.html#writable-canned-queries"&gt;Writable canned queries&lt;/a&gt; feature can be used to configure SQL queries that write to the database. Let's try that now, by creating a &lt;code&gt;metadata.yml&lt;/code&gt; file containing the following:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-ent"&gt;plugins&lt;/span&gt;:
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Create messages table in messages.db on startup&lt;/span&gt;
  &lt;span class="pl-ent"&gt;datasette-init&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;messages&lt;/span&gt;:
      &lt;span class="pl-ent"&gt;tables&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;messages&lt;/span&gt;:
          &lt;span class="pl-ent"&gt;columns&lt;/span&gt;:
            &lt;span class="pl-ent"&gt;id&lt;/span&gt;: &lt;span class="pl-s"&gt;integer&lt;/span&gt;
            &lt;span class="pl-ent"&gt;message&lt;/span&gt;: &lt;span class="pl-s"&gt;text&lt;/span&gt;
            &lt;span class="pl-ent"&gt;datetime&lt;/span&gt;: &lt;span class="pl-s"&gt;text&lt;/span&gt;
          &lt;span class="pl-ent"&gt;pk&lt;/span&gt;: &lt;span class="pl-s"&gt;id&lt;/span&gt;
&lt;span class="pl-ent"&gt;databases&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;messages&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;queries&lt;/span&gt;:
      &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; /messages/add_message query&lt;/span&gt;
      &lt;span class="pl-ent"&gt;add_message&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;sql&lt;/span&gt;: &lt;span class="pl-s"&gt;|-&lt;/span&gt;
&lt;span class="pl-s"&gt;          INSERT INTO messages (&lt;/span&gt;
&lt;span class="pl-s"&gt;            message, datetime&lt;/span&gt;
&lt;span class="pl-s"&gt;          ) VALUES (&lt;/span&gt;
&lt;span class="pl-s"&gt;            :message, :_now_datetime_utc&lt;/span&gt;
&lt;span class="pl-s"&gt;          )&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;        &lt;span class="pl-ent"&gt;write&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we can run the deploy like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;datasette publish fly \
--app messages-demo \
--create-volume 1 \
--create-db messages \
--install datasette-init \
--metadata metadata.yml&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This example uses the &lt;a href="https://datasette.io/plugins/datasette-init"&gt;datasette-init&lt;/a&gt; plugin to create the &lt;code&gt;messages&lt;/code&gt; table if it doesn't exist, and configures a canned query that can insert rows into that table (using the &lt;code&gt;_now_datetime_utc&lt;/code&gt; &lt;a href="https://docs.datasette.io/en/stable/sql_queries.html#magic-parameters"&gt;magic parameter&lt;/a&gt; to populate the current datetime.)&lt;/p&gt;
&lt;p&gt;Once deployed, the following URL provides a form that can be used to add messages:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;https://messages-demo.fly.dev/messages/add_message&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/messages.gif" alt="Animated demo showing the add_messages form - new messages added there show up with their date in the messages table" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;As a general rule though, writable databases become more interesting when you combine them with &lt;a href="https://datasette.io/plugins"&gt;Datasette plugins&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Using Fly volumes with datasette-tiddlywiki&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://datasette.io/plugins/datasette-tiddlywiki"&gt;datasette-tiddlywiki&lt;/a&gt; is a plugin that bundles a full install of &lt;a href="https://tiddlywiki.com/"&gt;TiddlyWiki&lt;/a&gt;, configured to use Datasette as a storage backend. I introduced this new plugin &lt;a href="https://simonwillison.net/2021/Dec/24/datasette-tiddlywiki/"&gt;back in December&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/datasette-tiddlywiki/releases/tag/0.2"&gt;Version 0.2&lt;/a&gt; of the plugin, which I released yesterday, adds authentication. TiddlyWiki instances can now be viewed by anyone but can only be edited by users with the &lt;code&gt;edit-tiddlywiki&lt;/code&gt; permisson, which is available to the &lt;code&gt;root&lt;/code&gt; user by default and can be granted to more users using the &lt;a href="https://docs.datasette.io/en/stable/authentication.html#permissions"&gt;Datasette permissions system&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's how to deploy &lt;code&gt;datasette-tiddlywiki&lt;/code&gt; to Fly, with an authenticated account that can then be used to edit the wiki.&lt;/p&gt;
&lt;p&gt;We're going to use the &lt;a href="https://datasette.io/plugins/datasette-auth-passwords"&gt;datasette-auth-passwords&lt;/a&gt; plugin to authenticate users. We need to install that locally in order to generate a password:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;datasette install datasette-auth-passwords&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Installing this plugin adds a new command to Datasette called &lt;code&gt;hash-password&lt;/code&gt; - we can use that like so:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;% datasette hash-password
Password: 
Repeat &lt;span class="pl-k"&gt;for&lt;/span&gt; confirmation: 
pbkdf2_sha256&lt;span class="pl-smi"&gt;$2&lt;/span&gt;60000&lt;span class="pl-smi"&gt;$b1ec52979ecf0c4810e3e22ea63c119e$AjaNnyuXzDXDRK&lt;/span&gt;/ZQjyn881J5GnVouKxI8B3DFu/C+M=&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;(The example hash shown here is for the password "password", so don't use it for anything!)&lt;/p&gt;
&lt;p&gt;Having created that password hash, we can use it to deploy our TiddlyWiki instance.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# First, install datasette-publish-fly
datasette install datasette-publish-fly

# Now deploy an instance with datasette-tiddlywiki
datasette publish fly \
--app simon-tiddlywiki \
--create-volume 1 \
--create-db tiddlywiki \
--install datasette-auth-passwords \
--install datasette-tiddlywiki \
--plugin-secret datasette-auth-passwords root_password_hash 'pbkdf2_sha256$2600...'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;--app&lt;/code&gt; value here is the name of the app on Fly, which needs to be globally unique - so pick your own value here.&lt;/p&gt;
&lt;p&gt;The full password hash needs to be pasted in with single quotes on that last line.&lt;/p&gt;
&lt;p&gt;Running this will churn away for a little bit deploying the application:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/datasette-publish-fly.png" alt="Screenshot of the terminal output from running datasette publish fly" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;It can take a few minutes after the script finishes running for the application to become available - Fly are working at speeding this up. If you visit it too early you may see a TLS connection error.&lt;/p&gt;
&lt;p&gt;Once the application is live, the following pages will be available:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simon-tiddlywiki.fly.dev/-/tiddlywiki"&gt;https://simon-tiddlywiki.fly.dev/-/tiddlywiki&lt;/a&gt; - TiddlyWiki, initially in read-only mode&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simon-tiddlywiki.fly.dev/-/login"&gt;https://simon-tiddlywiki.fly.dev/-/login&lt;/a&gt; - the login form provided by &lt;code&gt;datasette-auth-passwords&lt;/code&gt; - sign in here with username &lt;code&gt;root&lt;/code&gt; and the password you used to create the hash&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simon-tiddlywiki.fly.dev/"&gt;https://simon-tiddlywiki.fly.dev/&lt;/a&gt; - the Datasette homepage&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Sign in and visit the wiki and you'll be able to add and edit records there!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/tiddlywiki-demo.gif" alt="Animated GIF showing TiddlyWiki in action" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Since the plugin saves data to an underlying SQLite table, you can visit &lt;a href="https://simon-tiddlywiki.fly.dev/tiddlywiki/tiddlers"&gt;/tiddlywiki/tiddlers&lt;/a&gt; to see the data that has been stored.&lt;/p&gt;
&lt;h4&gt;Other plugins to try&lt;/h4&gt;
&lt;p&gt;Here are some other interesting plugins that make use of Datasette's &lt;a href="https://docs.datasette.io/en/stable/internals.html#await-db-execute-write-sql-params-none-block-true"&gt;internal methods for writing to a database&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://datasette.io/plugins/datasette-write"&gt;datasette-write&lt;/a&gt; provides a UI for executing write queries against the database&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://datasette.io/plugins/datasette-upload-csvs"&gt;datasette-upload-csvs&lt;/a&gt; adds an interface for uploading a CSV file to create a table&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://datasette.io/plugins/datasette-insert"&gt;datasette-insert&lt;/a&gt; adds a full JSON API for inserting and updating data&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://datasette.io/plugins/datasette-configure-fts"&gt;datasette-configure-fts&lt;/a&gt; can be used to configure full-text search against an existing table&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://datasette.io/plugins/datasette-edit-schema"&gt;datasette-edit-schema&lt;/a&gt; provides tools for adding, renaming and re-ordering columns&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://datasette.io/plugins/datasette-saved-queries"&gt;datasette-saved-queries&lt;/a&gt; lets users save queries to a database table&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://datasette.io/plugins/datasette-auth-github"&gt;datasette-auth-github&lt;/a&gt; and &lt;a href="https://datasette.io/plugins/datasette-auth-tokens"&gt;datasette-auth-tokens&lt;/a&gt; can be used to provide additional ways of authenticating with an instance.&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-publish-fly"&gt;datasette-publish-fly&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-publish-fly/releases/tag/1.1.1"&gt;1.1.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-publish-fly/releases"&gt;7 releases total&lt;/a&gt;) - 2022-02-14
&lt;br /&gt;Datasette plugin for publishing data using Fly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-tiddlywiki"&gt;datasette-tiddlywiki&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-tiddlywiki/releases/tag/0.2"&gt;0.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-tiddlywiki/releases"&gt;2 releases total&lt;/a&gt;) - 2022-02-14
&lt;br /&gt;Run TiddlyWiki in Datasette and save Tiddlers to a SQLite database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/strip-hidden-form-values"&gt;strip-hidden-form-values&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/strip-hidden-form-values/releases/tag/0.2.1"&gt;0.2.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/strip-hidden-form-values/releases"&gt;3 releases total&lt;/a&gt;) - 2022-02-11
&lt;br /&gt;CLI tool for stripping hidden form values from an HTML document&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-auth-passwords"&gt;datasette-auth-passwords&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-auth-passwords/releases/tag/1.0"&gt;1.0&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-auth-passwords/releases"&gt;11 releases total&lt;/a&gt;) - 2022-02-09
&lt;br /&gt;Datasette plugin for authentication using passwords&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette"&gt;datasette&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette/releases/tag/0.60.2"&gt;0.60.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette/releases"&gt;107 releases total&lt;/a&gt;) - 2022-02-07
&lt;br /&gt;An open source multi-tool for exploring and publishing data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.23"&gt;3.23&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;95 releases total&lt;/a&gt;) - 2022-02-04
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-leaflet-freedraw"&gt;datasette-leaflet-freedraw&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-leaflet-freedraw/releases/tag/0.3.1"&gt;0.3.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-leaflet-freedraw/releases"&gt;9 releases total&lt;/a&gt;) - 2022-02-03
&lt;br /&gt;Draw polygons on maps in Datasette&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sqlite/ordered-group-concat"&gt;Ordered group_concat() in SQLite&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/npm/annotated-package-json"&gt;Annotated package.json for idb-keyval&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/linux/allow-sudo-without-password-specific-command"&gt;Enabling a user to execute a specific command as root without a password&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/bash/nullglob-in-bash"&gt;nullglob in bash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="sqlite"/><category term="datasette"/><category term="weeknotes"/><category term="fly"/></entry><entry><title>Help scraping: track changes to CLI tools by recording their --help using Git</title><link href="https://simonwillison.net/2022/Feb/2/help-scraping/#atom-tag" rel="alternate"/><published>2022-02-02T23:46:35+00:00</published><updated>2022-02-02T23:46:35+00:00</updated><id>https://simonwillison.net/2022/Feb/2/help-scraping/#atom-tag</id><summary type="html">
    &lt;p&gt;I've been experimenting with a new variant of &lt;a href="https://simonwillison.net/2020/Oct/9/git-scraping/"&gt;Git scraping&lt;/a&gt; this week which I'm calling &lt;strong&gt;Help scraping&lt;/strong&gt;. The key idea is to track changes made to CLI tools over time by recording the output of their &lt;code&gt;--help&lt;/code&gt; commands in a Git repository.&lt;/p&gt;
&lt;p&gt;My new &lt;a href="https://github.com/simonw/help-scraper"&gt;help-scraper GitHub repository&lt;/a&gt; is my first implementation of this pattern.&lt;/p&gt;
&lt;p&gt;It uses &lt;a href="https://github.com/simonw/help-scraper/blob/cd18c5d7c1ac7c3851823dcabaa21ee920d73720/.github/workflows/scrape.yml"&gt;this GitHub Actions workflow&lt;/a&gt; to record the &lt;code&gt;--help&lt;/code&gt; output for the Amazon Web Services &lt;code&gt;aws&lt;/code&gt; CLI tool, and also for the &lt;code&gt;flyctl&lt;/code&gt; tool maintained by the &lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt; hosting platform.&lt;/p&gt;
&lt;p&gt;The workflow runs once a day. It loops through every available AWS command (using &lt;a href="https://github.com/simonw/help-scraper/blob/cd18c5d7c1ac7c3851823dcabaa21ee920d73720/aws_commands.py"&gt;this script&lt;/a&gt;) and records the output of that command's CLI help option to a &lt;code&gt;.txt&lt;/code&gt; file in the repository - then commits the result at the end.&lt;/p&gt;
&lt;p&gt;The result is a version history of changes made to those help files. It's essentially a much more detailed version of a changelog - capturing all sorts of details that might not be reflected in the official release notes for the tool.&lt;/p&gt;
&lt;p&gt;Here's an example. This morning, AWS released version 1.22.47 of their CLI helper tool. They release new versions on an almost daily basis.&lt;/p&gt;
&lt;p&gt;Here are &lt;a href="https://github.com/aws/aws-cli/blob/develop/CHANGELOG.rst#12247"&gt;the official release notes&lt;/a&gt; - 12 bullet points, spanning 12 different AWS services.&lt;/p&gt;
&lt;p&gt;My help scraper caught the details of the release in &lt;a href="https://github.com/simonw/help-scraper/commit/cd18c5d7c1ac7c3851823dcabaa21ee920d73720#diff-c2559859df8912eb13a6017d81019bf5452cead3e6495744e2d0c82202bf33ac"&gt;this commit&lt;/a&gt; - 89 changed files with 3,543 additions and 1,324 deletions. It tells the story of what's changed in a whole lot more detail.&lt;/p&gt;
&lt;p&gt;The AWS CLI tool is &lt;em&gt;enormous&lt;/em&gt;. Running &lt;code&gt;find aws -name '*.txt' | wc -l&lt;/code&gt; in that repository counts help pages for 11,401 individual commands - or 11,390 if you checkout the previous version, showing that there were 11 commands added just in this morning's new release.&lt;/p&gt;
&lt;p&gt;There are plenty of other ways of tracking changes made to AWS. I've previously kept an eye on &lt;a href="https://github.com/boto/botocore/commits/develop"&gt;the botocore GitHub history&lt;/a&gt;, which exposes changes to the underlying JSON - and there are projects like &lt;a href="https://awsapichanges.info/"&gt;awschanges.info&lt;/a&gt; which try to turn those sources of data into something more readable.&lt;/p&gt;
&lt;p&gt;But I think there's something pretty neat about being able to track changes in detail for any CLI tool that offers help output, independent of the official release notes for that tool. Not everyone writes release notes &lt;a href="https://simonwillison.net/2022/Jan/31/release-notes/"&gt;with the detail I like from them&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;I implemented this for &lt;code&gt;flyctl&lt;/code&gt; first, because I wanted to see what changes were being made that might impact my &lt;a href="https://datasette.io/plugins/datasette-publish-fly"&gt;datasette-publish-fly&lt;/a&gt; plugin which shells out to that tool. Then I realized it could be applied to AWS as well.&lt;/p&gt;
&lt;h4&gt;Help scraping my own projects&lt;/h4&gt;
&lt;p&gt;I got the initial idea for this technique from a change I made to my &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; and &lt;a href="https://sqlite-utils.datasette.io"&gt;sqlite-utils&lt;/a&gt; projects a few weeks ago.&lt;/p&gt;
&lt;p&gt;Both tools offer CLI commands with &lt;code&gt;--help&lt;/code&gt; output - but I kept on forgetting to update the help, partly because there was no easy way to see its output online without running the tools themselves.&lt;/p&gt;
&lt;p&gt;So, I added documentation pages that list the output of &lt;code&gt;--help&lt;/code&gt; for each of the CLI commands, generated using the &lt;a href="https://nedbatchelder.com/code/cog"&gt;Cog&lt;/a&gt; file generation tool:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://sqlite-utils.datasette.io/en/stable/cli-reference.html"&gt;sqlite-utils CLI reference&lt;/a&gt; (39 commands!)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/stable/cli-reference.html"&gt;datasette CLI reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Having added these pages, I realized that the Git commit history of those generated documentation pages could double up as a history of changes I made to the &lt;code&gt;--help&lt;/code&gt; output - here's &lt;a href="https://github.com/simonw/sqlite-utils/commits/main/docs/cli-reference.rst"&gt;that history for sqlite-utils&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It was a short jump from that to the idea of combining it with &lt;a href="https://simonwillison.net/2020/Oct/9/git-scraping/"&gt;Git scraping&lt;/a&gt; to generate history for other tools.&lt;/p&gt;
&lt;h4&gt;Bonus trick: GraphQL schema scraping&lt;/h4&gt;
&lt;p&gt;I've started making selective use of the &lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt; GraphQL API as part of &lt;a href="https://github.com/simonw/datasette-publish-fly"&gt;my plugin&lt;/a&gt; for publishing Datasette instances to that platform.&lt;/p&gt;
&lt;p&gt;Their GraphQL API is openly available, but it's not extensively documented - presumably because they reserve the right to make breaking changes to it at any time. I collected some notes on it in this TIL: &lt;a href="https://til.simonwillison.net/fly/undocumented-graphql-api"&gt;Using the undocumented Fly GraphQL API&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This gave me an idea: could I track changes made to their GraphQL schema using the same scraping trick?&lt;/p&gt;
&lt;p&gt;It turns out I can! There's an NPM package called &lt;a href="https://www.npmjs.com/package/get-graphql-schema"&gt;get-graphql-schema&lt;/a&gt; which can extract the GraphQL schema from any GraphQL server and write it out to disk:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;npx get-graphql-schema https://api.fly.io/graphql &amp;gt; /tmp/fly.graphql
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I've added that to my &lt;code&gt;help-scraper&lt;/code&gt; repository too - so now I have a &lt;a href="https://github.com/simonw/help-scraper/commits/main/flyctl/fly.graphql"&gt;commit history of changes&lt;/a&gt; of changes they are making there too. Here's &lt;a href="https://github.com/simonw/help-scraper/commit/f11072ff23f0d654395be7c2b1e98e84dbbc26a3#diff-c9cd49cf2aa3b983457e2812ba9313cc254aba74aaba9a36d56c867e32221589"&gt;an example&lt;/a&gt; from this morning.&lt;/p&gt;
&lt;h3&gt;Other weeknotes&lt;/h3&gt;
&lt;p&gt;I've decided to start setting goals on a monthly basis. My goal for February is to finally ship Datasette 1.0! I'm trying to make at least one commit every day that takes me closer to &lt;a href="https://github.com/simonw/datasette/milestone/7"&gt;that milestone&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This week I did &lt;a href="https://github.com/simonw/datasette/issues/1533"&gt;a bunch of work&lt;/a&gt; adding a &lt;code&gt;Link: https://...; rel="alternate"; type="application/datasette+json"&lt;/code&gt; HTTP header to a bunch of different pages in the Datasette interface, to support discovery of the JSON version of a page based on a URL to the human-readable version.&lt;/p&gt;
&lt;p&gt;(I had originally planned &lt;a href="https://github.com/simonw/datasette/issues/1534"&gt;to also support&lt;/a&gt; &lt;code&gt;Accept: application/json&lt;/code&gt; request headers for this, but I've been put off that idea by the discovery that Cloudflare &lt;a href="https://twitter.com/simonw/status/1478470282931163137"&gt;deliberately ignores&lt;/a&gt; the &lt;code&gt;Vary: Accept&lt;/code&gt; header.)&lt;/p&gt;
&lt;p&gt;Unrelated to Datasette: I also started a new Twitter thread, gathering &lt;a href="https://twitter.com/simonw/status/1487673496977113088"&gt;behind the scenes material from the movie the Mitchells vs the Machines&lt;/a&gt;. There's been a flurry of great material shared recently by the creative team, presumably as part of the run-up to awards season - and I've been enjoying trying to tie it all together in a thread.&lt;/p&gt;
&lt;p&gt;The last time I did this &lt;a href="https://twitter.com/simonw/status/1077737871602110466"&gt;was for Into the Spider-Verse&lt;/a&gt; (from the same studio) and that thread ended up running for more than a year!&lt;/p&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/pytest/only-run-integration"&gt;Opt-in integration tests with pytest --integration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/graphql/get-graphql-schema"&gt;get-graphql-schema&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/github-actions/python-3-11"&gt;Testing against Python 3.11 preview using GitHub Actions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/git"&gt;git&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/scraping"&gt;scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/graphql"&gt;graphql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/git-scraping"&gt;git-scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="cli"/><category term="git"/><category term="github"/><category term="projects"/><category term="scraping"/><category term="graphql"/><category term="weeknotes"/><category term="github-actions"/><category term="git-scraping"/><category term="fly"/></entry><entry><title>Weeknotes: python_requires, documentation SEO</title><link href="https://simonwillison.net/2022/Jan/25/weeknotes/#atom-tag" rel="alternate"/><published>2022-01-25T23:54:52+00:00</published><updated>2022-01-25T23:54:52+00:00</updated><id>https://simonwillison.net/2022/Jan/25/weeknotes/#atom-tag</id><summary type="html">
    &lt;p&gt;Fixed Datasette on Python 3.6 for the last time. Worked on documentation infrastructure improvements. Spent some time with Fly Volumes.&lt;/p&gt;
&lt;h4&gt;Datasette 0.60.1 for Python 3.6&lt;/h4&gt;
&lt;p&gt;I got &lt;a href="https://github.com/simonw/datasette/issues/1609"&gt;a report&lt;/a&gt; that users of Python 3.6 were seeing errors when they tried to install Datasette.&lt;/p&gt;
&lt;p&gt;I actually &lt;a href="https://github.com/simonw/datasette/issues/1577"&gt;dropped support&lt;/a&gt; for 3.6 a few weeks ago, but that shouldn't have affected the already released Datasette 0.60 - so something was clearly wrong.&lt;/p&gt;
&lt;p&gt;This lead me to finally get my head around how &lt;code&gt;pip install&lt;/code&gt; handles Python version support. It's actually a very neat system which I hadn't previously taken the time to understand.&lt;/p&gt;
&lt;p&gt;Python packages can (and should!) provide a &lt;code&gt;python_requires=&lt;/code&gt; line in their &lt;code&gt;setup.py&lt;/code&gt;. That line for Datasette currently looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;python_requires="&amp;gt;=3.7"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But in the 0.60 release it was still this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;python_requires="&amp;gt;=3.6"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When you run &lt;code&gt;pip install package&lt;/code&gt; this becomes part of the &lt;code&gt;pip&lt;/code&gt; resolution mechanism - it will default to attempting to install the highest available version of the package that supports your version of Python.&lt;/p&gt;
&lt;p&gt;So why did &lt;code&gt;pip install datasette&lt;/code&gt; break? It turned out that one of Datasette's dependencies, &lt;a href="https://www.uvicorn.org/"&gt;Uvicorn&lt;/a&gt;, had dropped support for Python 3.6 but did not have a &lt;code&gt;python_requires&lt;/code&gt; indicator that pip could use to resolve the correct version.&lt;/p&gt;
&lt;p&gt;Coincidentally, Uvicorn actually added &lt;code&gt;python_requires&lt;/code&gt; just &lt;a href="https://github.com/encode/uvicorn/pull/1328"&gt;a few weeks ago&lt;/a&gt; - but it wasn't out in a release yet, so &lt;code&gt;pip install&lt;/code&gt; couldn't take it into account.&lt;/p&gt;
&lt;p&gt;I raised this issue with the Uvicorn development team and  they turned around a fix really promptly - &lt;a href="https://github.com/encode/uvicorn/releases/tag/0.17.0.post1"&gt;0.17.0.post1&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But before I had seen how fast the Uvicorn team could move I figured out how to fix the issue myself, thanks to &lt;a href="https://twitter.com/samuel_hames/status/1484327636860293121"&gt;a tip from Sam Hames&lt;/a&gt; on Twitter.&lt;/p&gt;
&lt;p&gt;The key to fixing it was &lt;a href="https://www.python.org/dev/peps/pep-0508/#environment-markers"&gt;environment markers&lt;/a&gt;, a feature of Python's dependency resolution system that allows you to provide extra rules for when a dependency should be used.&lt;/p&gt;
&lt;p&gt;Here's an &lt;code&gt;install_requires=&lt;/code&gt; example showing these in action:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;install_requires&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;[
    &lt;span class="pl-s"&gt;"uvicorn~=0.11"&lt;/span&gt;,
    &lt;span class="pl-s"&gt;'uvicorn&amp;lt;=0.16.0;python_version&amp;lt;="3.6"'&lt;/span&gt;
]&lt;/pre&gt;
&lt;p&gt;This will install a Uvicorn version that loosely matches 0.11, but over-rides that rule to specify that it must be &lt;code&gt;&amp;lt;=0.16.0&lt;/code&gt; if the user's Python version is 3.6 or less.&lt;/p&gt;
&lt;p&gt;Since Datasette 0.60.1 will be the last version of Datasette to support Python 3.6, I decided to play it safe and pin the dependencies of every library to the most recent version that I have tested in Python 3.6 myself. Here's &lt;a href="https://github.com/simonw/datasette/blob/0.60.1/setup.py#L44-L78"&gt;the setup.py file&lt;/a&gt; I constructed for that.&lt;/p&gt;
&lt;p&gt;This ties into a larger open question for me about Datasette's approach to pinning dependencies.&lt;/p&gt;
&lt;p&gt;The rule of thumb I've heard is that you should pin dependencies for standalone released tools but leave dependencies loose for libraries that people will use as dependencies in their own projects - ensuring those users can run with different dependency versions if their projects require them.&lt;/p&gt;
&lt;p&gt;Datasette is &lt;em&gt;mostly&lt;/em&gt; a standalone tool - but it can also be used as a library. I'm actually planning to make its use as a library more obvious through &lt;a href="https://github.com/simonw/datasette/issues/1398"&gt;improvements to the documentation&lt;/a&gt; in the future.&lt;/p&gt;
&lt;p&gt;As such, pinning exact versions doesn't feel quite right to me.&lt;/p&gt;
&lt;p&gt;Maybe the solution here is to split the reusable library parts of Datasette out into a separate package - maybe &lt;code&gt;datasette-core&lt;/code&gt; - and have the &lt;code&gt;datasette&lt;/code&gt; CLI package depend on exact pinned dependencies while the &lt;code&gt;datasette-core&lt;/code&gt; library uses loose dependencies instead.&lt;/p&gt;
&lt;p&gt;Still thinking about this.&lt;/p&gt;
&lt;h4&gt;Datasette documentation tweaks&lt;/h4&gt;
&lt;p&gt;Datasette uses &lt;a href="https://readthedocs.org/"&gt;Read The Docs&lt;/a&gt; to host the documentation. Among other benefits, this makes it easy to host multiple documentation versions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.datasette.io/en/latest/"&gt;docs.datasette.io/en/latest/&lt;/a&gt; is the latest version of the documentation, continuously deployed from the &lt;code&gt;main&lt;/code&gt; branch on GitHub&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.datasette.io/en/stable/"&gt;docs.datasette.io/en/stable/&lt;/a&gt; is the documentation for the most recent stable (non alpha or beta) release - currently 0.60.1. This is the version you get when you run &lt;code&gt;pip install datasette&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.datasette.io/en/0.59/"&gt;docs.datasette.io/en/0.59/&lt;/a&gt; is the documentation for version 0.59 - and every version back to 0.22.1 is hosted under similar URLs, currently covering 73 different releases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Those previous versions all automatically show a note at the top of the page warning that this is out-dated documentation and linking back to stable - a feature which Read The Docs provides automatically.&lt;/p&gt;
&lt;p&gt;But... I noticed that &lt;code&gt;/en/latest/&lt;/code&gt; didn't do this. I wanted a warning banner to let people know that they were looking at the in-development version of the documentation.&lt;/p&gt;
&lt;p&gt;After some digging around, I fixed it using &lt;a href="https://til.simonwillison.net/readthedocs/link-from-latest-to-stable"&gt;a little bit of extra JavaScript&lt;/a&gt; added to my documentation template. Here's the key implementation detail:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;jQuery&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-k"&gt;function&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;$&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-c"&gt;// Show banner linking to /stable/ if this is a /latest/ page&lt;/span&gt;
  &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-c1"&gt;!&lt;/span&gt;&lt;span class="pl-pds"&gt;&lt;span class="pl-c1"&gt;/&lt;/span&gt;&lt;span class="pl-cce"&gt;\/&lt;/span&gt;latest&lt;span class="pl-cce"&gt;\/&lt;/span&gt;&lt;span class="pl-c1"&gt;/&lt;/span&gt;&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;test&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;location&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;pathname&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-k"&gt;return&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;
  &lt;span class="pl-k"&gt;var&lt;/span&gt; &lt;span class="pl-s1"&gt;stableUrl&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;location&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;pathname&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;replace&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"/latest/"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;"/stable/"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-c"&gt;// Check it's not a 404&lt;/span&gt;
  &lt;span class="pl-en"&gt;fetch&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;stableUrl&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt; &lt;span class="pl-c1"&gt;method&lt;/span&gt;: &lt;span class="pl-s"&gt;"HEAD"&lt;/span&gt; &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;then&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;response&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;response&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;status&lt;/span&gt; &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-c1"&gt;200&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-c"&gt;// Page exists, insert a warning banner linking to it&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This uses &lt;code&gt;fetch()&lt;/code&gt; to make an HTTP HEAD request for the stable documentation page, and inserts a warning banner only if that page isn't a 404. This avoids linking to a non-existant documentation page if it has been created in development but not yet released as part of a stable release. &lt;a href="https://docs.datasette.io/en/latest/csv_export.html"&gt;Example here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/latest-docs-warning.png" alt="Screenshot of the documentation page with a banner that says: This documentation covers the development version of Datasette. See this page for the current stable release." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Thinking about this problem got me thinking about SEO.&lt;/p&gt;
&lt;p&gt;A problem I've had with other projects that host multiple versions of their documentation is that sometimes I'll search on Google and end up landing on a page covering a much older version of their project. I think I've had this happen for both PostgreSQL and Python in the past.&lt;/p&gt;
&lt;p&gt;What's best practice for avoiding this? I &lt;a href="https://twitter.com/simonw/status/1484287724773203971"&gt;asked on Twitter&lt;/a&gt; and also started digging around for answers. "If in doubt, imitate Django" is a good general rule of thumb, so I had a look at how Django did this and spotted the following in the HTML of one of their &lt;a href="https://docs.djangoproject.com/en/2.2/topics/db/"&gt;prior version pages&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-text-html-basic"&gt;&lt;pre&gt;&lt;span class="pl-kos"&gt;&amp;lt;&lt;/span&gt;&lt;span class="pl-ent"&gt;link&lt;/span&gt; &lt;span class="pl-c1"&gt;rel&lt;/span&gt;="&lt;span class="pl-s"&gt;canonical&lt;/span&gt;" &lt;span class="pl-c1"&gt;href&lt;/span&gt;="&lt;span class="pl-s"&gt;https://docs.djangoproject.com/en/4.0/topics/db/&lt;/span&gt;"&lt;span class="pl-kos"&gt;&amp;gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;So Django are using the &lt;a href="https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls"&gt;rel=canonical&lt;/a&gt; tag to point crawlers towards their most recent release.&lt;/p&gt;
&lt;p&gt;I decided to implement that myself... and then discovered that the Datasette documentation was doing it already! Read The Docs &lt;a href="https://docs.readthedocs.io/en/latest/custom_domains.html#canonical-urls"&gt;implement this piece&lt;/a&gt; of SEO best practice out of the box.&lt;/p&gt;
&lt;h4&gt;Datasette on Fly volumes&lt;/h4&gt;
&lt;p&gt;This one isn't released yet, but I made some good progress on it this week.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt; announced this week that they would be providing 3GB of volume storage to accounts on their free tier. They called this announcement &lt;a href="https://fly.io/blog/free-postgres/"&gt;Free Postgres Databases&lt;/a&gt;, but tucked away in the blog post was this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The lede is "free Postgres" because that's what matters to full stack apps. You don't have to use these for Postgres. If SQLite is more your jam, mount up to 3GB of volumes and use "free SQLite." Yeah, we're probably underselling that.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(There is &lt;a href="https://twitter.com/mrkurt/status/1484609372114272261"&gt;evidence&lt;/a&gt; that they may have been &lt;a href="https://xkcd.com/356/"&gt;nerd sniping&lt;/a&gt; me with that paragraph.)&lt;/p&gt;
&lt;p&gt;I have a plugin called &lt;a href="https://datasette.io/plugins/datasette-publish-fly"&gt;datasette-publish-fly&lt;/a&gt; which publishes Datasette instances to Fly. Obviously that needs to grow support for configuring volumes!&lt;/p&gt;
&lt;p&gt;I've so far &lt;a href="https://github.com/simonw/datasette-publish-fly/issues/11"&gt;completed the research&lt;/a&gt; on how that feature should work. The next step is to finish &lt;a href="https://github.com/simonw/datasette-publish-fly/issues/10"&gt;implementing the feature&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;sqlite-utils --help&lt;/h4&gt;
&lt;p&gt;I pushed out minor release &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-22-1"&gt;sqlite-utils 3.22.1&lt;/a&gt; today with one notable improvement: every single one of the 39 commands in the CLI tool now includes an example of usage as part of the &lt;code&gt;--help&lt;/code&gt; text.&lt;/p&gt;

&lt;p&gt;This feature was inspired by the new &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli-reference.html#cli-reference"&gt;CLI reference page&lt;/a&gt; in the documentation, which shows the help output for every command on one page - making it much easier to spot potential improvements.&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.22.1"&gt;3.22.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;94 releases total&lt;/a&gt;) - 2022-01-26
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/s3-credentials"&gt;s3-credentials&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/s3-credentials/releases/tag/0.10"&gt;0.10&lt;/a&gt; - (&lt;a href="https://github.com/simonw/s3-credentials/releases"&gt;10 releases total&lt;/a&gt;) - 2022-01-25
&lt;br /&gt;A tool for creating credentials for accessing S3 buckets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette"&gt;datasette&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette/releases/tag/0.60.1"&gt;0.60.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette/releases"&gt;106 releases total&lt;/a&gt;) - 2022-01-21
&lt;br /&gt;An open source multi-tool for exploring and publishing data&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sqlite/json-extract-path"&gt;json_extract() path syntax in SQLite&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/aws/helper-for-boto-aws-pagination"&gt;Helper function for pagination using AWS boto3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/pixelmator/pixel-editing-favicon"&gt;Pixel editing a favicon with Pixelmator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/readthedocs/documentation-seo-canonical"&gt;Promoting the stable version of the documentation using rel=canonical&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/readthedocs/link-from-latest-to-stable"&gt;Linking from /latest/ to /stable/ on Read The Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/fly/undocumented-graphql-api"&gt;Using the undocumented Fly GraphQL API&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/read-the-docs"&gt;read-the-docs&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="python"/><category term="seo"/><category term="datasette"/><category term="weeknotes"/><category term="fly"/><category term="sqlite-utils"/><category term="read-the-docs"/></entry></feed>