Simon Willison's Weblog: disclosures

Hacking the WiFi-enabled color screen GitHub Universe conference badge

2025-10-28T17:17:44+00:00

I'm at GitHub Universe this week (thanks to a free ticket from Microsoft). Yesterday I picked up my conference badge... which incorporates a ~~full Raspberry Pi~~ Raspberry Pi Pico microcontroller with a battery, color screen, WiFi and bluetooth.

GitHub Universe has a tradition of hackable conference badges - the badge last year had an eInk display. This year's is a huge upgrade though - a color screen and WiFI connection makes this thing a genuinely useful little computer!

The only thing it's missing is a keyboard - the device instead provides five buttons total - Up, Down, A, B, C. It might be possible to get a bluetooth keyboard to work though I'll believe that when I see it - there's not a lot of space on this device for a keyboard driver.

Everything is written using MicroPython, and the device is designed to be hackable: connect it to a laptop with a USB-C cable and you can start modifying the code directly on the device.

Getting setup with the badge

Out of the box the badge will play an opening animation (implemented as a sequence of PNG image frames) and then show a home screen with six app icons.

The default apps are mostly neat Octocat-themed demos: a flappy-bird clone, a tamagotchi-style pet, a drawing app that works like an etch-a-sketch, an IR scavenger hunt for the conference venue itself (this thing has an IR sensor too!), and a gallery app showing some images.

The sixth app is a badge app. This will show your GitHub profile image and some basic stats, but will only work if you dig out a USB-C cable and make some edits to the files on the badge directly.

I did this on a Mac. I plugged a USB-C cable into the badge which caused MacOS to treat it as an attached drive volume. In that drive are several files including secrets.py. Open that up, confirm the WiFi details are correct and add your GitHub username. The file should look like this:

WIFI_SSID = "..."
WIFI_PASSWORD = "..."
GITHUB_USERNAME = "simonw"

The badge comes with the SSID and password for the GitHub Universe WiFi network pre-populated.

That's it! Unmount the disk, hit the reboot button on the back of the badge and when it comes back up again the badge app should look something like this:

Building your own apps

Here's the official documentation for building software for the badge.

When I got mine yesterday the official repo had not yet been updated, so I had to figure this out myself.

I copied all of the code across to my laptop, added it to a Git repo and then fired up Claude Code and told it:

Investigate this code and add a detailed README

Here's the result, which was really useful for getting a start on understanding how it all worked.

Each of the six default apps lives in a apps/ folder, for example apps/sketch/ for the sketching app.

There's also a menu app which powers the home screen. That lives in apps/menu/. You can edit code in here to add new apps that you create to that screen.

I told Claude:

Add a new app to it available from the menu which shows network status and other useful debug info about the machine it is running on

This was a bit of a long-shot, but it totally worked!

The first version had an error:

I OCRd that photo (with the Apple Photos app) and pasted the message into Claude Code and it fixed the problem.

This almost worked... but the addition of a seventh icon to the 2x3 grid meant that you could select the icon but it didn't scroll into view. I had Claude fix that for me too.

Here's the code for apps/debug/__init__.py, and the full Claude Code transcript created using my terminal-to-HTML app described here.

Here are the four screens of the debug app:

An icon editor

The icons used on the app are 24x24 pixels. I decided it would be neat to have a web app that helps build those icons, including the ability to start by creating an icon from an emoji.

I bulit this one using Claude Artifacts. Here's the result, now available at tools.simonwillison.net/icon-editor:

And a REPL

I noticed that last year's badge configuration app (which I can't find in github.com/badger/badger.github.io any more, I think they reset the history on that repo?) worked by talking to MicroPython over the Web Serial API from Chrome. Here's my archived copy of that code.

Wouldn't it be useful to have a REPL in a web UI that you could use to interact with the badge directly over USB?

I pointed Claude Code at a copy of that repo and told it:

Based on this build a new HTML with inline JavaScript page that uses WebUSB to simply test that the connection to the badge works and then list files on that device using the same mechanism

It took a bit of poking (here's the transcript) but the result is now live at tools.simonwillison.net/badge-repl. It only works in Chrome - you'll need to plug the badge in with a USB-C cable and then click "Connect to Badge".

Get hacking

If you're a GitHub Universe attendee I hope this is useful. The official badger.github.io site has plenty more details to help you get started.

There isn't yet a way to get hold of this hardware outside of GitHub Universe - I know they had some supply chain challenges just getting enough badges for the conference attendees!

It's a very neat device, built for GitHub by Pimoroni in Sheffield, UK. A version of this should become generally available in the future under the name "Pimoroni Tufty 2350".

Update: Setup with iPhone only

If you don't have a laptop with you it's still possible to start hacking on the device using just a USB-C cable.

Plug the badge into the phone, hit the reset button on the back twice to switch it into disk mode and open the iPhone Files app - the badge should appear as a mounted disk called BADGER.

I used Textastic to edit that secrets.py and configure a new badge, then hit reset again to restart it.

Tags: github, hardware-hacking, microsoft, ai, generative-ai, raspberry-pi, llms, claude-code, disclosures, micropython

Claude Code for web - a new asynchronous coding agent from Anthropic

2025-10-20T19:43:15+00:00

Anthropic launched Claude Code for web this morning. It's an asynchronous coding agent - their answer to OpenAI's Codex Cloud and Google's Jules, and has a very similar shape. I had preview access over the weekend and I've already seen some very promising results from it.

It's available online at claude.ai/code and shows up as a tab in the Claude iPhone app as well:

As far as I can tell it's their latest Claude Code CLI app wrapped in a container (Anthropic are getting really good at containers these days) and configured to --dangerously-skip-permissions. It appears to behave exactly the same as the CLI tool, and includes a neat "teleport" feature which can copy both the chat transcript and the edited files down to your local Claude Code CLI tool if you want to take over locally.

It's very straight-forward to use. You point Claude Code for web at a GitHub repository, select an environment (fully locked down, restricted to an allow-list of domains or configured to access domains of your choosing, including "*" for everything) and kick it off with a prompt.

While it's running you can send it additional prompts which are queued up and executed after it completes its current step.

Once it's done it opens a branch on your repo with its work and can optionally open a pull request.

Putting Claude Code for web to work

Claude Code for web's PRs are indistinguishable from Claude Code CLI's, so Anthropic told me it was OK to submit those against public repos even during the private preview. Here are some examples from this weekend:

Add query-string-stripper.html tool against my simonw/tools repo - a very simple task that creates (and deployed via GitHub Pages) this query-string-stripper tool.
minijinja vs jinja2 Performance Benchmark - I ran this against a private repo and then copied the results here, so no PR. Here's the prompt I used.
Update deepseek-ocr README to reflect successful project completion - I noticed that the README produced by Claude Code CLI for this project was misleadingly out of date, so I had Claude Code for web fix the problem.

That second example is the most interesting. I saw a tweet from Armin about his MiniJinja Rust template language adding support for Python 3.14 free threading. I hadn't realized that project had Python bindings, so I decided it would be interesting to see a quick performance comparison between MiniJinja and Jinja2.

I ran Claude Code for web against a private repository with a completely open environment (* in the allow-list) and prompted:

I’m interested in benchmarking the Python bindings for https://github.com/mitsuhiko/minijinja against the equivalente template using Python jinja2

Design and implement a benchmark for this. It should use the latest main checkout of minijinja and the latest stable release of jinja2. The benchmark should use the uv version of Python 3.14 and should test both the regular 3.14 and the 3.14t free threaded version - so four scenarios total

The benchmark should run against a reasonably complicated example of a template, using template inheritance and loops and such like In the PR include a shell script to run the entire benchmark, plus benchmark implantation, plus markdown file describing the benchmark and the results in detail, plus some illustrative charts created using matplotlib

I entered this into the Claude iPhone app on my mobile keyboard, hence the typos.

It churned away for a few minutes and gave me exactly what I asked for. Here's one of the four charts it created:

(I was surprised to see MiniJinja out-performed by Jinja2, but I guess Jinja2 has had a decade of clever performance optimizations and doesn't need to deal with any extra overhead of calling out to Rust.)

Note that I would likely have got the exact same result running this prompt against Claude CLI on my laptop. The benefit of Claude Code for web is entirely in its convenience as a way of running these tasks in a hosted container managed by Anthropic, with a pleasant web and mobile UI layered over the top.

Anthropic are framing this as part of their sandboxing strategy

It's interesting how Anthropic chose to announce this new feature: the product launch is buried half way down their new engineering blog post Beyond permission prompts: making Claude Code more secure and autonomous, which starts like this:

Claude Code's new sandboxing features, a bash tool and Claude Code on the web, reduce permission prompts and increase user safety by enabling two boundaries: filesystem and network isolation.

I'm very excited to hear that Claude Code CLI is taking sandboxing more seriously. I've not yet dug into the details of that - it looks like it's using seatbelt on macOS and Bubblewrap on Linux.

Anthropic released a new open source (Apache 2) library, anthropic-experimental/sandbox-runtime, with their implementation of this so far.

Filesystem sandboxing is relatively easy. The harder problem is network isolation, which they describe like this:

Network isolation, by only allowing internet access through a unix domain socket connected to a proxy server running outside the sandbox. This proxy server enforces restrictions on the domains that a process can connect to, and handles user confirmation for newly requested domains. And if you’d like further-increased security, we also support customizing this proxy to enforce arbitrary rules on outgoing traffic.

This is crucial to protecting against both prompt injection and lethal trifecta attacks. The best way to prevent lethal trifecta attacks is to cut off one of the three legs, and network isolation is how you remove the data exfiltration leg that allows successful attackers to steal your data.

If you run Claude Code for web in "No network access" mode you have nothing to worry about.

I'm a little bit nervous about their "Trusted network access" environment. It's intended to only allow access to domains relating to dependency installation, but the default domain list has dozens of entries which makes me nervous about unintended exfiltration vectors sneaking through.

You can also configure a custom environment with your own allow-list. I have one called "Everything" which allow-lists "*", because for projects like my MiniJinja/Jinja2 comparison above there are no secrets or source code involved that need protecting.

I see Anthropic's focus on sandboxes as an acknowledgment that coding agents run in YOLO mode (--dangerously-skip-permissions and the like) are enormously more valuable and productive than agents where you have to approve their every step.

The challenge is making it convenient and easy to run them safely. This kind of sandboxing kind is the only approach to safety that feels credible to me.

Update: A note on cost: I'm currently using a Claude "Max" plan that Anthropic gave me in order to test some of their features, so I don't have a good feeling for how Claude Code would cost for these kinds of projects.

From running npx ccusage@latest (an unofficial cost estimate tool) it looks like I'm using between $1 and $5 worth of daily Claude CLI invocations at the moment.

Tags: armin-ronacher, jinja, sandboxing, security, ai, prompt-injection, generative-ai, llms, anthropic, claude, coding-agents, claude-code, lethal-trifecta, async-coding-agents, disclosures

NVIDIA DGX Spark: great hardware, early days for the ecosystem

2025-10-14T23:36:21+00:00

NVIDIA sent me a preview unit of their new DGX Spark desktop "AI supercomputer". I've never had hardware to review before! You can consider this my first ever sponsored post if you like, but they did not pay me any cash and aside from an embargo date they did not request (nor would I grant) any editorial input into what I write about the device.

The device retails for around $4,000. They officially go on sale tomorrow.

First impressions are that this is a snazzy little computer. It's similar in size to a Mac mini, but with an exciting textured surface that feels refreshingly different and a little bit science fiction.

There is a very powerful machine tucked into that little box. Here are the specs, which I had Claude Code figure out for me by poking around on the device itself:

Hardware Specifications

Architecture: aarch64 (ARM64)

CPU: 20 cores

10x Cortex-X925 (performance cores)

10x Cortex-A725 (efficiency cores)

RAM: 119 GB total (112 GB available) - I’m not sure why Claude reported it differently here, the machine is listed as 128GB - it looks like a 128GB == 119GiB thing because Claude used free -h

Storage: 3.7 TB (6% used, 3.3 TB available)

GPU Specifications

Model: NVIDIA GB10 (Blackwell architecture)

Compute Capability: sm_121 (12.1)

Memory: 119.68 GB

Multi-processor Count: 48 streaming multiprocessors

Architecture: Blackwell

Short version: this is an ARM64 device with 128GB of memory that's available to both the GPU and the 20 CPU cores at the same time, strapped onto a 4TB NVMe SSD.

The Spark is firmly targeted at “AI researchers”. It’s designed for both training and running models.

The tricky bit: CUDA on ARM64

Until now almost all of my own model running experiments have taken place on a Mac. This has gotten far less painful over the past year and a half thanks to the amazing work of the MLX team and community, but it's still left me deeply frustrated at my lack of access to the NVIDIA CUDA ecosystem. I've lost count of the number of libraries and tutorials which expect you to be able to use Hugging Face Transformers or PyTorch with CUDA, and leave you high and dry if you don't have an NVIDIA GPU to run things on.

Armed (ha) with my new NVIDIA GPU I was excited to dive into this world that had long eluded me... only to find that there was another assumption baked in to much of this software: x86 architecture for the rest of the machine.

This resulted in all kinds of unexpected new traps for me to navigate. I eventually managed to get a PyTorch 2.7 wheel for CUDA on ARM, but failed to do so for 2.8. I'm not confident there because the wheel itself is unavailable but I'm finding navigating the PyTorch ARM ecosystem pretty confusing.

NVIDIA are trying to make this easier, with mixed success. A lot of my initial challenges got easier when I found their official Docker container, so now I'm figuring out how best to use Docker with GPUs. Here's the current incantation that's been working for me:

docker run -it --gpus=all \
  -v /usr/local/cuda:/usr/local/cuda:ro \
  nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 \
  bash

I have not yet got my head around the difference between CUDA 12 and 13. 13 appears to be very new, and a lot of the existing tutorials and libraries appear to expect 12.

The missing documentation isn't missing any more

When I first received this machine around a month ago there was very little in the way of documentation to help get me started. This meant climbing the steep NVIDIA+CUDA learning curve mostly on my own.

This has changed substantially in just the last week. NVIDIA now have extensive guides for getting things working on the Spark and they are a huge breath of fresh air - exactly the information I needed when I started exploring this hardware.

Here's the getting started guide, details on the DGX dashboard web app, and the essential collection of playbooks. There's still a lot I haven't tried yet just in this official set of guides.

Claude Code for everything

Claude Code was an absolute lifesaver for me while I was trying to figure out how best to use this device. My Ubuntu skills were a little rusty, and I also needed to figure out CUDA drivers and Docker incantations and how to install the right versions of PyTorch. Claude 4.5 Sonnet is much better than me at all of these things.

Since many of my experiments took place in disposable Docker containers I had no qualms at all about running it in YOLO mode:

IS_SANDBOX=1 claude --dangerously-skip-permissions

The IS_SANDBOX=1 environment variable stops Claude from complaining about running as root.

Before I found out about IS_SANDBOX

I was tipped off about IS_SANDBOX after I published this article. Here's my original workaround:

Claude understandably won't let you do this as root, even in a Docker container, so I found myself using the following incantation in a fresh nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 instance pretty often:

apt-get update && apt-get install -y sudo
# pick the first free UID >=1000
U=$(for i in $(seq 1000 65000); do if ! getent passwd $i >/dev/null; then echo $i; break; fi; done)
echo "Chosen UID: $U"
# same for a GID
G=$(for i in $(seq 1000 65000); do if ! getent group $i >/dev/null; then echo $i; break; fi; done)
echo "Chosen GID: $G"
# create user+group
groupadd -g "$G" devgrp
useradd -m -u "$U" -g "$G" -s /bin/bash dev
# enable password-less sudo:
printf 'dev ALL=(ALL) NOPASSWD:ALL\n' > /etc/sudoers.d/90-dev-nopasswd
chmod 0440 /etc/sudoers.d/90-dev-nopasswd
# Install npm
DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y npm
# Install Claude
npm install -g @anthropic-ai/claude-code

Then switch to the dev user and run Claude for the first time:

su - dev
claude --dangerously-skip-permissions

This will provide a URL which you can visit to authenticate with your Anthropic account, confirming by copying back a token and pasting it into the terminal.

Docker tip: you can create a snapshot of the current image (with Claude installed) by running docker ps to get the container ID and then:

docker commit --pause=false <container_id> cc:snapshot

Then later you can start a similar container using:

docker run -it \
  --gpus=all \
  -v /usr/local/cuda:/usr/local/cuda:ro \
  cc:snapshot bash

Here's an example of the kinds of prompts I've been running in Claude Code inside the container:

I want to run https://huggingface.co/unsloth/Qwen3-4B-GGUF using llama.cpp - figure out how to get llama cpp working on this machine such that it runs with the GPU, then install it in this directory and get that model to work to serve a prompt. Goal is to get this command to run: llama-cli -hf unsloth/Qwen3-4B-GGUF -p "I believe the meaning of life is" -n 128 -no-cnv

That one worked flawlessly - Claude checked out the llama.cpp repo, compiled it for me and iterated on it until it could run that model on the GPU. Here's a full transcript, converted from Claude's .jsonl log format to Markdown using a script I vibe coded just now.

I later told it:

Write out a markdown file with detailed notes on what you did. Start with the shortest form of notes on how to get a successful build, then add a full account of everything you tried, what went wrong and how you fixed it.

Which produced this handy set of notes.

Tailscale was made for this

Having a machine like this on my local network is neat, but what's even neater is being able to access it from anywhere else in the world, from both my phone and my laptop.

Tailscale is perfect for this. I installed it on the Spark (using the Ubuntu instructions here), signed in with my SSO account (via Google)... and the Spark showed up in the "Network Devices" panel on my laptop and phone instantly.

I can SSH in from my laptop or using the Termius iPhone app on my phone. I've also been running tools like Open WebUI which give me a mobile-friendly web interface for interacting with LLMs on the Spark.

Here comes the ecosystem

The embargo on these devices dropped yesterday afternoon, and it turns out a whole bunch of relevant projects have had similar preview access to myself. This is fantastic news as many of the things I've been trying to figure out myself suddenly got a whole lot easier.

Four particularly notable examples:

Ollama works out of the box. They actually had a build that worked a few weeks ago, and were the first success I had running an LLM on the machine.
llama.cpp creator Georgi Gerganov just published extensive benchmark results from running llama.cpp on a Spark. He's getting ~3,600 tokens/second to read the prompt and ~59 tokens/second to generate a response with the MXFP4 version of GPT-OSS 20B and ~817 tokens/second to read and ~18 tokens/second to generate for GLM-4.5-Air-GGUF.
LM Studio now have a build for the Spark. I haven't tried this one yet as I'm currently using my machine exclusively via SSH.
vLLM - one of the most popular engines for serving production LLMs - had early access and there's now an official NVIDIA vLLM NGC Container for running their stack.

Here's a tutorial from Unsloth on fine-tuning gpt-oss-20b on the Spark.

Should you get one?

It's a bit too early for me to provide a confident recommendation concerning this machine. As indicated above, I've had a tough time figuring out how best to put it to use, largely through my own inexperience with CUDA, ARM64 and Ubuntu GPU machines in general.

The ecosystem improvements in just the past 24 hours have been very reassuring though. I expect it will be clear within a few weeks how well supported this machine is going to be.

Tags: hardware, ai, docker, tailscale, generative-ai, local-llms, llms, nvidia, ollama, llama-cpp, coding-agents, claude-code, lm-studio, disclosures, nvidia-spark

OpenAI DevDay 2025 live blog

2025-10-06T17:03:15+00:00

I'm at OpenAI DevDay in Fort Mason, San Francisco today. As I did last year, I'm going to be live blogging the announcements from the kenote. Unlike last year, this year there's a livestream.

Disclosure: OpenAI provided me with a free ticket and reserved me a seat in the press/influencer section for the keynote.

Tags: ai, openai, generative-ai, llms, disclosures, live-blog

GitHub Copilot CLI is now in public preview

2025-09-25T23:58:34+00:00

GitHub Copilot CLI is now in public preview

GitHub now have their own entry in the coding terminal CLI agent space: Copilot CLI.

It's the same basic shape as Claude Code, Codex CLI, Gemini CLI and a growing number of other tools in this space. It's a terminal UI which you accepts instructions and can modify files, run commands and integrate with GitHub's MCP server and other MCP servers that you configure.

Two notable features compared to many of the others:

It works against the GitHub Models backend. It defaults to Claude Sonnet 4 but you can set COPILOT_MODEL=gpt-5 to switch to GPT-5. Presumably other models will become available soon.
It's billed against your existing GitHub Copilot account. Pricing details are here - they're split into "Agent mode" requests and "Premium" requests. Different plans get different allowances, which are shared with other products in the GitHub Copilot family.

The best available documentation right now is the copilot --help screen - here's a copy of that in a Gist.

It's a competent entry into the market, though it's missing features like the ability to paste in images which have been introduced to Claude Code and Codex CLI over the past few months.

Disclosure: I got a preview of this at an event at Microsoft's offices in Seattle last week. They did not pay me for my time but they did cover my flight, hotel and some dinners.

Tags: github, microsoft, ai, generative-ai, github-copilot, llms, ai-assisted-programming, ai-agents, coding-agents, claude-code, codex, disclosures

Previewing GPT-5 at OpenAI's office

2025-08-07T19:11:19+00:00

A couple of weeks ago I was invited to OpenAI's headquarters for a "preview event", for which I had to sign both an NDA and a video release waiver. I suspected it might relate to either GPT-5 or the OpenAI open weight models... and GPT-5 it was!

OpenAI had invited five developers: Claire Vo, Theo Browne, Ben Hylak, Shawn @swyx Wang, and myself. We were all given early access to the new models and asked to spend a couple of hours (of paid time, see my disclosures) experimenting with them, while being filmed by a professional camera crew.

The resulting video is now up on YouTube. Unsurprisingly most of my edits related to SVGs of pelicans.

Tags: youtube, ai, openai, generative-ai, llms, pelican-riding-a-bicycle, gpt-5, disclosures, theo-browne, gpt