<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: openai</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/openai.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-06-05T23:56:40+00:00</updated><author><name>Simon Willison</name></author><entry><title>OpenAI Help: Lockdown Mode</title><link href="https://simonwillison.net/2026/Jun/5/openai-help-lockdown-mode/#atom-tag" rel="alternate"/><published>2026-06-05T23:56:40+00:00</published><updated>2026-06-05T23:56:40+00:00</updated><id>https://simonwillison.net/2026/Jun/5/openai-help-lockdown-mode/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://help.openai.com/en/articles/20001061-lockdown-mode"&gt;OpenAI Help: Lockdown Mode&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI first teased this &lt;a href="https://openai.com/index/introducing-lockdown-mode-and-elevated-risk-labels-in-chatgpt/"&gt;in February&lt;/a&gt;, but now it's live and "rolling out to eligible personal accounts, including Free, Go, Plus, and Pro, and self-serve ChatGPT Business accounts":&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Lockdown Mode is designed to help prevent the final stage of data exfiltration from a prompt injection attack by limiting outbound network requests that could transfer sensitive data to an attacker. Lockdown Mode does not prevent prompt injections from appearing in the content ChatGPT processes. For example, a prompt injection could appear in cached web content or in an uploaded file, and could still affect the behavior or accuracy of a response.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This looks really good to me.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/"&gt;Lethal Trifecta&lt;/a&gt; occurs when an LLM system has access to all three of access to private data, exposure to untrusted content and a way to steal data and transmit it back to the attacker.&lt;/p&gt;
&lt;p&gt;The only way to solve the trifecta is to cut off one of the three legs, and by far the easiest leg to restrict without making your LLM systems far less useful is the exfiltration vectors to steal data.&lt;/p&gt;
&lt;p&gt;It looks to me like lockdown mode directly attacks that leg, using mechanisms that are deterministic and, crucially, are not evaluated by AI systems that themselves can be subverted by sufficiently devious attacks.&lt;/p&gt;
&lt;p&gt;The existence of lockdown mode does however imply that ChatGPT, in its default settings, does &lt;em&gt;not&lt;/em&gt; provide robust protection against sufficiently determined data exfiltration attacks!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: &lt;a href="https://twitter.com/cryps1s/status/2062923575049531422"&gt;This tweet&lt;/a&gt; OpenAI CISO Dane Stuckey:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Lockdown mode is not meant for everyone. However, for folks who have an elevated risk profile - due to who they are, what they work on, or the types of data they work with - it's an excellent tool for further securing themselves. This has some tradeoffs on functionality and utility, but for these users, the tradeoff is worthwhile.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-injection"&gt;prompt-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lethal-trifecta"&gt;lethal-trifecta&lt;/a&gt;&lt;/p&gt;



</summary><category term="security"/><category term="ai"/><category term="openai"/><category term="prompt-injection"/><category term="llms"/><category term="lethal-trifecta"/></entry><entry><title>I think Anthropic and OpenAI have found product-market fit</title><link href="https://simonwillison.net/2026/May/27/product-market-fit/#atom-tag" rel="alternate"/><published>2026-05-27T16:38:35+00:00</published><updated>2026-05-27T16:38:35+00:00</updated><id>https://simonwillison.net/2026/May/27/product-market-fit/#atom-tag</id><summary type="html">
    &lt;p&gt;Anthropic are &lt;a href="https://techcrunch.com/2026/05/20/anthropic-says-its-about-to-have-its-first-profitable-quarter/"&gt;strongly rumored&lt;/a&gt; to be about to have their first profitable quarter. Stories &lt;a href="https://www.theinformation.com/newsletters/applied-ai/uber-cto-shows-claude-code-can-blow-ai-budgets"&gt;are circulating&lt;/a&gt; of companies surprised at how expensive their LLM bills are becoming from usage by their staff. I think this is because OpenAI and Anthropic have both found product-market fit.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/May/27/product-market-fit/#enterprise-customers-are-now-paying-api-prices"&gt;Enterprise customers are now paying API prices&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/May/27/product-market-fit/#i-think-they-ve-found-product-market-fit"&gt;I think they've found product-market fit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/May/27/product-market-fit/#and-they-re-ramping-up"&gt;And they're ramping up&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/May/27/product-market-fit/#the-ai-failure-stories-around-this-are-pretty-thin"&gt;The AI-failure stories around this are pretty thin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/May/27/product-market-fit/#we-also-know-the-labs-are-spending-a-lot"&gt;We also know the labs are spending a lot&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/May/27/product-market-fit/#api-revenue-is-becoming-less-important"&gt;API revenue is becoming less important&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2026/May/27/product-market-fit/#april-is-a-new-inflection-point"&gt;April is a new inflection point&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="enterprise-customers-are-now-paying-api-prices"&gt;Enterprise customers are now paying API prices&lt;/h4&gt;
&lt;p&gt;I currently subscribe to the $100/month Max plan from Anthropic and the $100/month Pro plan from OpenAI. If you are a heavy user of coding agents these plans are a fantastic deal. I just ran the &lt;a href="https://github.com/ryoppippi/ccusage"&gt;ccusage&lt;/a&gt; tool on my laptop to get an estimate of how much I would have spent if I were to pay for API tokens in the past 30 days and got:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;$1,199.79 for Anthropic Claude Code&lt;/li&gt;
&lt;li&gt;$980.37 for OpenAI Codex&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That's $2,180.16 worth of tokens for $200 - not bad at all! I'm a moderately heavy user of these tools, but I'm certainly not running agents every hour of the day and night.&lt;/p&gt;
&lt;p&gt;I had assumed that companies making extensive use of agents were getting similar discounts. It turns out I &lt;em&gt;could not have been more wrong&lt;/em&gt; about that.&lt;/p&gt;
&lt;p&gt;I haven't been able to track down the exact date, but at some point in the last six months Anthropic switched their Enterprise plan (originally &lt;a href="https://www.anthropic.com/news/claude-code-on-team-and-enterprise"&gt;"Claude seats include enough usage for a typical workday" back in August 2025&lt;/a&gt;) to $20/seat/month plus API pricing for usage. This story about the change &lt;a href="https://www.theinformation.com/articles/anthropic-changes-pricing-bill-firms-based-ai-use-amid-compute-crunch"&gt;from The Information&lt;/a&gt; is dated Apr 14, 2026, but cites an Anthropic spokesperson claiming that the pricing change occurred in November 2025. Existing customers are finding out about the change as they renew their contracts.&lt;/p&gt;
&lt;p&gt;OpenAI made a similar pricing change in April. The &lt;a href="https://help.openai.com/en/articles/20001106-codex-rate-card"&gt;Codex rate card&lt;/a&gt; (&lt;a href="https://web.archive.org/web/20260519062438/https://help.openai.com/en/articles/20001106-codex-rate-card"&gt;Internet Archive copy&lt;/a&gt;) currently says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: On April 2, 2026, we updated Codex pricing to align with API token usage, instead of per-message pricing. This change was applicable to new and existing Plus, Pro, ChatGPT Business and new ChatGPT Enterprise plans.&lt;/p&gt;
&lt;p&gt;On April 23, 2026, we made this update for all existing ChatGPT Enterprise plans as well, inclusive of Edu, Health, Gov, and ChatGPT for Teachers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's a little harder to decode as they quote prices in "credits", but as far as I can tell those credit costs are an exact match for the API token costs listed for those models.&lt;/p&gt;
&lt;p&gt;All of which is to say that as of April 2026 the "Enterprise" cost for both OpenAI Codex and Anthropic Claude Code/Cowork is the same as the listed API price.&lt;/p&gt;
&lt;p&gt;GPT-5.5 (released April 23rd) is 2x the API price of GPT-5.4. Opus 4.7 (April 16th) is &lt;a href="https://simonwillison.net/2026/Apr/20/claude-token-counts/"&gt;around 1.4x&lt;/a&gt; the price of Opus 4.6 when you take their new tokenizer into account.&lt;/p&gt;
&lt;p&gt;So April saw both leading model companies release new frontier models with a higher API price, &lt;em&gt;and&lt;/em&gt; both companies now have measures to lock their enterprise customers (who tend to sign year-long deals) at those API prices, not the previous extreme discounts.&lt;/p&gt;
&lt;h4 id="i-think-they-ve-found-product-market-fit"&gt;I think they've found product-market fit&lt;/h4&gt;
&lt;p&gt;Why these sudden aggressive moves on pricing? Both Anthropic and OpenAI are planning to IPO, but I suspect there's a more important factor here: I think they've finally found product-market fit, with the coding/general-purpose agent products embodied by Claude Code/Cowork and Codex.&lt;/p&gt;
&lt;p&gt;Tools like ChatGPT are wildly popular, but that wild popularity has been difficult to turn into revenue. In February &lt;a href="https://finance.yahoo.com/news/chatgpt-almost-1-billion-weekly-212157499.html"&gt;OpenAI boasted&lt;/a&gt; more than 900 million weekly active users for ChatGPT, but only 50 million - 5.6% of that - were paying consumer subscribers.&lt;/p&gt;
&lt;p&gt;Charging $10-$20/month per user is an OK business, but you'd need 1-2 billion subscribers sticking around for four years to cover &lt;a href="https://openai.com/global-affairs/seizing-the-ai-opportunity/"&gt;$1 trillion in infrastructure&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Companies spending $200+/month/user will get you there a whole lot faster - and as noted above, as a power-user I'm at ~$1,000/month in API costs per vendor already.&lt;/p&gt;
&lt;p&gt;Coding agents really did change everything. These are tools which burn &lt;em&gt;vastly&lt;/em&gt; more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals. Right now that's still mostly software engineers, but a coding agent is a tool that can automate anything you can do by typing commands into a computer... so they are clearly applicable to a much wider set of skilled knowledge workers.&lt;/p&gt;
&lt;p&gt;As I've &lt;a href="https://simonwillison.net/tags/november-2025-inflection/"&gt;discussed on this site at length&lt;/a&gt;, the models released in November 2025 elevated agents to being genuinely useful. We've had six months to get used to that idea now - it's no wonder companies are beginning to spend real money on this technology.&lt;/p&gt;
&lt;p&gt;You could argue that ChatGPT achieved product-market fit when it became the &lt;a href="https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/"&gt;fastest-growing consumer app in history&lt;/a&gt; back in February 2023... but it certainly wasn't making any actual money back then. Coding agents plus enterprise pricing marks the point when these companies start making &lt;em&gt;very&lt;/em&gt; real revenue. Maybe even enough to start covering their costs!&lt;/p&gt;
&lt;h4 id="and-they-re-ramping-up"&gt;And they're ramping up&lt;/h4&gt;
&lt;p&gt;As further evidence that enterprise agents represent product-market fit for these companies, consider their open job listings.&lt;/p&gt;
&lt;p&gt;OpenAI have &lt;a href="https://openai.com/careers/search/"&gt;703 open jobs&lt;/a&gt; right now, of which I'd categorize 229 (32.6%) as relating to enterprise sales and support - account executives, "Go To Market", "Forward Deployed Engineers" and the like.&lt;/p&gt;
&lt;p&gt;Anthropic have &lt;a href="https://www.anthropic.com/careers/jobs"&gt;390 open jobs&lt;/a&gt;, 105 (26.9%) of which look enterprisey to me.&lt;/p&gt;
&lt;p&gt;It's pleasingly ironic that these AI labs have picked a business model with such a heavy demand on human labor - enterprise sales contracts don't close themselves without a whole lot of humans in the mix!&lt;/p&gt;
&lt;p&gt;&lt;small&gt;(I ran this analysis by scraping their job sites with Claude Code, then having it use Datasette's &lt;a href="https://docs.datasette.io/en/latest/json_api.html"&gt;JSON API&lt;/a&gt; to pipe that data into Datasette Cloud where I used &lt;a href="https://agent.datasette.io/"&gt;Datasette Agent&lt;/a&gt; for the analysis, &lt;a href="https://gist.github.com/simonw/5632d208d76b3c8b34f1fdbaf69eb1b8#agent-4"&gt;exported here&lt;/a&gt;. Dogfood!)&lt;/small&gt;&lt;/p&gt;
&lt;h4 id="the-ai-failure-stories-around-this-are-pretty-thin"&gt;The AI-failure stories around this are pretty thin&lt;/h4&gt;
&lt;p&gt;I started digging into this in response to &lt;a href="https://news.ycombinator.com/item?id=48287025#48287219"&gt;a growing volume&lt;/a&gt; of stories claiming that large companies were sounding the alarm because their AI usage costs had grown so large.&lt;/p&gt;
&lt;p&gt;The most widely cited of these stories appear quite overblown to me.&lt;/p&gt;
&lt;p&gt;The most discussed has been Uber, based on &lt;a href="https://www.theinformation.com/newsletters/applied-ai/uber-cto-shows-claude-code-can-blow-ai-budgets"&gt;this report&lt;/a&gt; where CTO Praveen Neppalli Naga indicated that Uber had "maxed out its full year AI budget just a few months into 2026", mostly thanks to Claude Code.&lt;/p&gt;
&lt;p&gt;Given that Claude Code only got &lt;em&gt;really&lt;/em&gt; good in November it's entirely unsurprising to me that a budget set in 2025 may have failed to predict demand for that tool in 2026!&lt;/p&gt;
&lt;p&gt;That Uber story was further fueled by comments made by Uber's COO, Andrew Macdonald, on the Rapid Response podcast. I tracked down &lt;a href="https://www.youtube.com/watch?v=y_mQ6xLcKyc&amp;amp;t=1616s"&gt;the segment&lt;/a&gt; and there really isn't much there. Here's what Andrew said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;But then you sometimes go and talk to your senior engineering leaders and you're saying, OK, how many projects that were on the cutting room floor got moved above the line because of the productivity gains because 25% of our code commits were via Claude Code last quarter?&lt;/p&gt;
&lt;p&gt;That link is not there yet, right? I think maybe implicitly there's more that is getting shipped. But it's very hard to draw a line between one of those stats and, OK, now we're actually producing like 25% more useful consumer features, right? And that line is hard to draw.&lt;/p&gt;
&lt;p&gt;[...] And so if you're not actually able to draw a direct line to how much useful features and functionality you're shipping to your users, that trade becomes harder to justify.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Somehow this fragment turned into headlines like &lt;a href="https://www.businessinsider.com/uber-coo-andrew-macdonald-ai-token-spending-harder-justify-2026-5"&gt;Uber's COO says it's getting harder to justify the money spent on AI tokenmaxxing&lt;/a&gt;, because the market for stories about AI failures remains enormous.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update 29th May 2026&lt;/strong&gt;: I edited the above quote to add that last paragraph ending in "becomes harder to justify" on &lt;a href="https://x.com/MadisonMills22/status/2060343512936186240"&gt;the suggestion of Madison Mills&lt;/a&gt; - previously my quoted section stopped at "hard to draw". Here's the &lt;a href="https://gist.github.com/simonw/59096a338c82f6f95e40e3d7c7b5bad9"&gt;full unedited transcript&lt;/a&gt; from MacWhisper.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The other popular story around this is &lt;a href="https://www.theverge.com/tech/930447/microsoft-claude-code-discontinued-notepad"&gt;Microsoft starts canceling Claude Code licenses&lt;/a&gt;, ostensibly to encourage their engineers to dogfood their own Copilot CLI agent instead - but The Verge reporter Tom Warren says "sources tell me the decision is also a financial one", triggered by the June 30th end of Microsoft's financial year.&lt;/p&gt;
&lt;p&gt;I think both of these stories support my "product-market fit" hypothesis. The best advice I ever heard on pricing a product was that your customer should &lt;em&gt;suck air through their teeth&lt;/em&gt; and then say yes. Uber's budget overrun and Microsoft's seat cancellations look like that effect playing out in practice.&lt;/p&gt;
&lt;h4 id="we-also-know-the-labs-are-spending-a-lot"&gt;We also know the labs are spending a lot&lt;/h4&gt;
&lt;p&gt;The big AI labs spend billions of dollars on both training and inference. Credible figures are hard to come by, but we did get one huge hint as to the figures involved from, oddly enough, the recent &lt;a href="https://www.sec.gov/Archives/edgar/data/1181412/000162828026036936/spaceexplorationtechnologi.htm"&gt;SpaceX S-1&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] in May 2026, we entered into &lt;strong&gt;Cloud Services Agreements with Anthropic PBC&lt;/strong&gt; (“Anthropic”), an AI research and development public benefit corporation, with respect to access to &lt;strong&gt;compute capacity across COLOSSUS and COLOSSUS II&lt;/strong&gt;. Pursuant to these agreements, the customer &lt;strong&gt;has agreed to pay us $1.25 billion per month&lt;/strong&gt; through May 2029 [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The &lt;a href="https://www.anthropic.com/news/higher-limits-spacex"&gt;Anthropic announcement&lt;/a&gt; said that this deal meant they could "increase our usage limits for Claude Code and the Claude API", heavily implying that Colossus is being used for inference, not model training.&lt;/p&gt;
&lt;p&gt;Anthropic already have vast amounts of compute from other providers. The fact that they're willing to spend $1.25 billion per month for extra capacity from just &lt;em&gt;one&lt;/em&gt; of their vendors hints at how big these inference budgets have become.&lt;/p&gt;
&lt;h4 id="api-revenue-is-becoming-less-important"&gt;API revenue is becoming less important&lt;/h4&gt;
&lt;p&gt;Over the past two years my impression has been that OpenAI made more of their income from subscription revenue while Anthropic made more from their API.&lt;/p&gt;
&lt;p&gt;Anthropic's API revenue was historically quite dependent on a small number of large API customers - &lt;a href="https://venturebeat.com/ai/anthropic-revenue-tied-to-two-customers-as-ai-pricing-war-threatens-margins"&gt;this VentureBeat story from August 2025&lt;/a&gt; quotes "sources familiar with the matter" suggesting that just Cursor and GitHub Copilot were responsible for $1.2 billion of the company's then-$4 billion revenue.&lt;/p&gt;
&lt;p&gt;Today Anthropic are rumored to hit &lt;a href="https://www.wsj.com/tech/ai/mind-blowing-growth-is-about-to-propel-anthropic-into-its-first-profitable-quarter-7edbf2f4"&gt;$10.9 billion in the second quarter&lt;/a&gt;, potentially even operating at a profit for the first time.&lt;/p&gt;
&lt;p&gt;This pivot-to-Enterprise suggests that the labs have realized that the real money lies in cutting out the middlemen. Anthropic's Claude Code directly competes with Cursor and Copilot. No wonder Cursor are &lt;a href="https://cursor.com/blog/composer-2"&gt;investing in their own models&lt;/a&gt;!&lt;/p&gt;
&lt;h4 id="april-is-a-new-inflection-point"&gt;April is a new inflection point&lt;/h4&gt;
&lt;p&gt;I've called November 2025 the &lt;a href="https://simonwillison.net/tags/november-2025-inflection/"&gt;November inflection point&lt;/a&gt; because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got &lt;em&gt;good&lt;/em&gt; - good enough that we've spent the last six months adapting to agent systems that can reliably get useful work done.&lt;/p&gt;
&lt;p&gt;I think April 2026 is a new inflection point where the revenue implications of this have started to land, to the benefit of the frontier AI labs and with material impacts on the budgets of large companies.&lt;/p&gt;
&lt;p&gt;We'll know for sure how real this moment is when the S-1 documents for the upcoming Anthropic and OpenAI IPOs give us some real, audited numbers to get our teeth into.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-cowork"&gt;claude-cowork&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-agent"&gt;datasette-agent&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uber"&gt;uber&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="datasette"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="llm-pricing"/><category term="coding-agents"/><category term="claude-code"/><category term="codex"/><category term="claude-cowork"/><category term="november-2025-inflection"/><category term="datasette-agent"/><category term="uber"/></entry><entry><title>datasette-agent-openai-imagegen 0.1a1</title><link href="https://simonwillison.net/2026/May/12/datasette-agent-openai-imagegen/#atom-tag" rel="alternate"/><published>2026-05-12T22:03:22+00:00</published><updated>2026-05-12T22:03:22+00:00</updated><id>https://simonwillison.net/2026/May/12/datasette-agent-openai-imagegen/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/datasette/datasette-agent-openai-imagegen/releases/tag/0.1a1"&gt;datasette-agent-openai-imagegen 0.1a1&lt;/a&gt;&lt;/p&gt;
        
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/text-to-image"&gt;text-to-image&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-agent"&gt;datasette-agent&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="datasette"/><category term="openai"/><category term="text-to-image"/><category term="datasette-agent"/></entry><entry><title>llm 0.32a2</title><link href="https://simonwillison.net/2026/May/12/llm/#atom-tag" rel="alternate"/><published>2026-05-12T17:45:07+00:00</published><updated>2026-05-12T17:45:07+00:00</updated><id>https://simonwillison.net/2026/May/12/llm/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm/releases/tag/0.32a2"&gt;llm 0.32a2&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;A bunch of useful stuff in this &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; alpha, but the most important detail is this one:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Most reasoning-capable OpenAI models now use the &lt;a href="https://developers.openai.com/api/reference/responses/overview"&gt;&lt;code&gt;/v1/responses&lt;/code&gt;&lt;/a&gt; endpoint instead of &lt;code&gt;/v1/chat/completions&lt;/code&gt;. This enables interleaved reasoning across tool calls for GPT-5 class models. &lt;a href="https://github.com/simonw/llm/pull/1435"&gt;#1435&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This means you can now see the summarized reasoning tokens when you run prompts against an OpenAI model, displayed in a different color to standard error. Use the &lt;code&gt;-R&lt;/code&gt; or &lt;code&gt;--hide-reasoning&lt;/code&gt; flags if you don't want to see that.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-release-notes"&gt;annotated-release-notes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="projects"/><category term="ai"/><category term="annotated-release-notes"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/></entry><entry><title>Quoting Luke Curley</title><link href="https://simonwillison.net/2026/May/9/luke-curley/#atom-tag" rel="alternate"/><published>2026-05-09T01:03:58+00:00</published><updated>2026-05-09T01:03:58+00:00</updated><id>https://simonwillison.net/2026/May/9/luke-curley/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://moq.dev/blog/webrtc-is-the-problem/"&gt;&lt;p&gt;WebRTC is designed to &lt;strong&gt;degrade and drop my prompt&lt;/strong&gt; during poor network conditions.&lt;/p&gt;
&lt;p&gt;wtf my dude&lt;/p&gt;
&lt;p&gt;WebRTC aggressively drops audio packets to keep latency low. If you’ve ever heard distorted audio on a conference call, that’s WebRTC baybee. The idea is that conference calls depend on rapid back-and-forth, so pausing to wait for audio is unacceptable.&lt;/p&gt;
&lt;p&gt;…but as a user, I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate. After all, I’m paying good money to boil the ocean, and a garbage prompt means a garbage response. It’s not like LLMs are particularly responsive anyway.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;But I’m not allowed to wait&lt;/strong&gt;. It’s &lt;em&gt;impossible&lt;/em&gt; to even retransmit a WebRTC audio packet within a browser; we tried at Discord. The &lt;em&gt;implementation&lt;/em&gt; is hard-coded for real-time latency &lt;strong&gt;or else&lt;/strong&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://moq.dev/blog/webrtc-is-the-problem/"&gt;Luke Curley&lt;/a&gt;, OpenAI’s WebRTC Problem, in response to &lt;a href="https://openai.com/index/delivering-low-latency-voice-ai-at-scale/"&gt;How OpenAI delivers low-latency voice AI at scale&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webrtc"&gt;webrtc&lt;/a&gt;&lt;/p&gt;



</summary><category term="openai"/><category term="webrtc"/></entry><entry><title>Quoting John Gruber</title><link href="https://simonwillison.net/2026/May/5/john-gruber/#atom-tag" rel="alternate"/><published>2026-05-05T00:46:29+00:00</published><updated>2026-05-05T00:46:29+00:00</updated><id>https://simonwillison.net/2026/May/5/john-gruber/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://daringfireball.net/2026/05/y_combinators_stake_in_openai"&gt;&lt;p&gt;So it’s well known that Y Combinator owns &lt;em&gt;some&lt;/em&gt; stake in OpenAI. But how big is that stake? This seems like devilishly difficult information to obtain. I asked around and a little birdie who knows several OpenAI investors came back with an answer: Y Combinator owns about 0.6 percent of OpenAI. At OpenAI’s current &lt;a href="https://openai.com/index/accelerating-the-next-phase-ai/"&gt;$852 billion valuation&lt;/a&gt;, that’s worth over $5 billion.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://daringfireball.net/2026/05/y_combinators_stake_in_openai"&gt;John Gruber&lt;/a&gt;, Y Combinator’s Stake in OpenAI&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/john-gruber"&gt;john-gruber&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/y-combinator"&gt;y-combinator&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;&lt;/p&gt;



</summary><category term="john-gruber"/><category term="y-combinator"/><category term="ai"/><category term="openai"/></entry><entry><title>Codex CLI 0.128.0 adds /goal</title><link href="https://simonwillison.net/2026/Apr/30/codex-goals/#atom-tag" rel="alternate"/><published>2026-04-30T23:23:17+00:00</published><updated>2026-04-30T23:23:17+00:00</updated><id>https://simonwillison.net/2026/Apr/30/codex-goals/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/openai/codex/releases/tag/rust-v0.128.0"&gt;Codex CLI 0.128.0 adds /goal&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The latest version of OpenAI's Codex CLI coding agent adds their own version of the &lt;a href="https://ghuntley.com/ralph/"&gt;Ralph loop&lt;/a&gt;: you can now set a &lt;code&gt;/goal&lt;/code&gt; and Codex will keep on looping until it evaluates that the goal has been completed... or the configured token budget has been exhausted.&lt;/p&gt;
&lt;p&gt;It looks like the feature is mainly implemented though the &lt;a href="https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/continuation.md"&gt;goals/continuation.md&lt;/a&gt; and &lt;a href="https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/budget_limit.md"&gt;goals/budget_limit.md&lt;/a&gt; prompts, which are automatically injected at the end of a turn.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/fcoury/status/2049917871799636201"&gt;@fcoury&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="coding-agents"/><category term="system-prompts"/><category term="codex"/><category term="agentic-engineering"/></entry><entry><title>Our evaluation of OpenAI's GPT-5.5 cyber capabilities</title><link href="https://simonwillison.net/2026/Apr/30/gpt-55-cyber-capabilities/#atom-tag" rel="alternate"/><published>2026-04-30T23:03:24+00:00</published><updated>2026-04-30T23:03:24+00:00</updated><id>https://simonwillison.net/2026/Apr/30/gpt-55-cyber-capabilities/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities"&gt;Our evaluation of OpenAI&amp;#x27;s GPT-5.5 cyber capabilities&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The UK's AI Security Institute &lt;a href="https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities"&gt;previously evaluated Claude Mythos&lt;/a&gt;: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="ai-security-research"/><category term="gpt"/></entry><entry><title>Quoting OpenAI Codex base_instructions</title><link href="https://simonwillison.net/2026/Apr/28/openai-codex/#atom-tag" rel="alternate"/><published>2026-04-28T22:02:53+00:00</published><updated>2026-04-28T22:02:53+00:00</updated><id>https://simonwillison.net/2026/Apr/28/openai-codex/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://github.com/openai/codex/blob/66b0781502be5de3b1909525c987643b9e5e407d/codex-rs/models-manager/models.json#L55"&gt;&lt;p&gt;&lt;code&gt;Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.&lt;/code&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://github.com/openai/codex/blob/66b0781502be5de3b1909525c987643b9e5e407d/codex-rs/models-manager/models.json#L55"&gt;OpenAI Codex base_instructions&lt;/a&gt;, for GPT-5.5&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="system-prompts"/><category term="codex"/><category term="gpt"/></entry><entry><title>Tracking the history of the now-deceased OpenAI Microsoft AGI clause</title><link href="https://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/#atom-tag" rel="alternate"/><published>2026-04-27T18:38:17+00:00</published><updated>2026-04-27T18:38:17+00:00</updated><id>https://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/#atom-tag</id><summary type="html">
    &lt;p&gt;For many years, Microsoft and OpenAI's relationship has included a weird clause saying that, should AGI be achieved, Microsoft's commercial IP rights to OpenAI's technology would be null and void. That clause appeared to end today. I decided to try and track its expression over time on &lt;a href="https://openai.com/"&gt;openai.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;OpenAI, July 22nd 2019 in &lt;a href="https://openai.com/index/microsoft-invests-in-and-partners-with-openai/"&gt;Microsoft invests in and partners with OpenAI to support us building beneficial AGI&lt;/a&gt; (emphasis mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenAI is producing a sequence of increasingly powerful AI technologies, which requires a lot of capital for computational power. The most obvious way to cover costs is to build a product, but that would mean changing our focus. Instead, we intend to license &lt;strong&gt;some of our pre-AGI technologies&lt;/strong&gt;, with Microsoft becoming our preferred partner for commercializing them.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But what &lt;em&gt;is&lt;/em&gt; AGI? The &lt;a href="https://openai.com/charter/"&gt;OpenAI Charter&lt;/a&gt; was first published in April 2018 and has remained unchanged at least since this &lt;a href="https://web.archive.org/web/20190311213352/https://openai.com/charter/"&gt;March 11th 2019 archive.org capture&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's the problem: if you're going to sign an agreement with Microsoft that is dependent on knowing when "AGI" has been achieved, you need something a little more concrete.&lt;/p&gt;
&lt;p&gt;In December 2024 &lt;a href="https://www.theinformation.com/articles/microsoft-and-openai-wrangle-over-terms-of-their-blockbuster-partnership"&gt;The Information reported the details&lt;/a&gt; (summarized here outside of their paywall &lt;a href="https://techcrunch.com/2024/12/26/microsoft-and-openai-have-a-financial-definition-of-agi-report/"&gt;by TechCrunch&lt;/a&gt;):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Last year’s agreement between Microsoft and OpenAI, which hasn’t been disclosed, said AGI would be achieved only when OpenAI has developed systems that have the ability to generate the maximum total profits to which its earliest investors, including Microsoft, are entitled, according to documents OpenAI distributed to investors. Those profits total about $100 billion, the documents showed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So AGI is now whenever OpenAI's systems are capable of generating $100 billion in profit?&lt;/p&gt;
&lt;p&gt;In October 2025 the process changed to being judged by an "independent expert panel". In &lt;a href="https://openai.com/index/next-chapter-of-microsoft-openai-partnership/"&gt;The next chapter of the Microsoft–OpenAI partnership&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The agreement preserves key elements that have fueled this successful partnership—meaning OpenAI remains Microsoft’s frontier model partner and Microsoft continues to have exclusive IP rights and Azure API exclusivity until Artificial General Intelligence (AGI). [...]&lt;/p&gt;
&lt;p&gt;Once AGI is declared by OpenAI, that declaration will now be verified by an independent expert panel. [...]&lt;/p&gt;
&lt;p&gt;Microsoft’s IP rights to research, defined as the confidential methods used in the development of models and systems, will remain until either the expert panel verifies AGI or through 2030, whichever is first.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenAI on February 27th, 2026 in &lt;a href="https://openai.com/index/continuing-microsoft-partnership/"&gt;Joint Statement from OpenAI and Microsoft&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AGI definition and processes are unchanged&lt;/strong&gt;. The contractual definition of AGI and the process for determining if it has been achieved remains the same.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenAI today, April 27th 2026 in &lt;a href="https://openai.com/index/next-phase-of-microsoft-partnership/"&gt;The next phase of the Microsoft OpenAI partnership&lt;/a&gt; (emphasis mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Microsoft will continue to have a license to OpenAI IP for models and products through 2032.  Microsoft’s license will now be non-exclusive.&lt;/li&gt;
&lt;li&gt;Microsoft will no longer pay a revenue share to OpenAI.&lt;/li&gt;
&lt;li&gt;Revenue share payments from OpenAI to Microsoft continue through 2030, &lt;strong&gt;independent of OpenAI’s technology progress&lt;/strong&gt;, at the same percentage but subject to a total cap.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;As far as I can tell "independent of OpenAI’s technology progress" is a declaration that the AGI clause is now dead. Here's The Verge coming to the same conclusion: &lt;a href="https://www.theverge.com/ai-artificial-intelligence/918981/openai-microsoft-renegotiate-contract"&gt;The AGI clause is dead&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My all-time favorite commentary on OpenAI's approach to AGI remains this 2023 hypothetical &lt;a href="https://www.bloomberg.com/opinion/articles/2023-11-20/who-controls-openai"&gt;by Matt Levine&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;And the investors wailed and gnashed their teeth but it’s true, that is what they agreed to, and they had no legal recourse. And OpenAI’s new CEO, and its nonprofit board, cut them a check for their capped return and said “bye” and went back to running OpenAI for the benefit of humanity. It turned out that a benign, carefully governed artificial superintelligence is really good for humanity, and OpenAI quickly solved all of humanity’s problems and ushered in an age of peace and abundance in which nobody wanted for anything or needed any Microsoft products. And capitalism came to an end.&lt;/p&gt;
&lt;/blockquote&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-history"&gt;computer-history&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/microsoft"&gt;microsoft&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="computer-history"/><category term="microsoft"/><category term="ai"/><category term="openai"/></entry><entry><title>Quoting Romain Huet</title><link href="https://simonwillison.net/2026/Apr/25/romain-huet/#atom-tag" rel="alternate"/><published>2026-04-25T12:06:55+00:00</published><updated>2026-04-25T12:06:55+00:00</updated><id>https://simonwillison.net/2026/Apr/25/romain-huet/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/romainhuet/status/2047955381578838357"&gt;&lt;p&gt;Since GPT-5.4, we’ve unified Codex and the main model into a single system, so there’s no separate coding line anymore.&lt;/p&gt;
&lt;p&gt;GPT-5.5 takes this further, with strong gains in agentic coding, computer use, and any task on a computer.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/romainhuet/status/2047955381578838357"&gt;Romain Huet&lt;/a&gt;, confirming OpenAI won't release a GPT-5.5-Codex model&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="gpt"/></entry><entry><title>GPT-5.5 prompting guide</title><link href="https://simonwillison.net/2026/Apr/25/gpt-5-5-prompting-guide/#atom-tag" rel="alternate"/><published>2026-04-25T04:13:36+00:00</published><updated>2026-04-25T04:13:36+00:00</updated><id>https://simonwillison.net/2026/Apr/25/gpt-5-5-prompting-guide/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5"&gt;GPT-5.5 prompting guide&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Now that GPT-5.5 is &lt;a href="https://developers.openai.com/api/docs/models/gpt-5.5"&gt;available in the API&lt;/a&gt;, OpenAI have released a wealth of useful tips on how best to prompt the new model.&lt;/p&gt;
&lt;p&gt;Here's a neat trick they recommend for applications that might spend considerable time thinking before returning a user-visible response:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've already noticed their Codex app doing this, and it does make longer running tasks feel less like the model has crashed.&lt;/p&gt;
&lt;p&gt;OpenAI suggest running the following in Codex to upgrade your existing code using advice embedded in their &lt;code&gt;openai-docs&lt;/code&gt; skill:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;$openai-docs migrate this project to gpt-5.5&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The upgrade guide the coding agent will follow &lt;a href="https://github.com/openai/skills/blob/724cd511c96593f642bddf13187217aa155d2554/skills/.curated/openai-docs/references/upgrade-guide.md#model-string--light-prompt-rewrite"&gt;is this one&lt;/a&gt;, which even includes light instructions on how to rewrite prompts to better fit the model.&lt;/p&gt;
&lt;p&gt;Also relevant is the &lt;a href="https://developers.openai.com/api/docs/guides/latest-model"&gt;Using GPT-5.5 guide&lt;/a&gt;, which opens with this warning:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To get the most out of GPT-5.5, treat it as a new model family to tune for, not a drop-in replacement for &lt;code&gt;gpt-5.2&lt;/code&gt; or &lt;code&gt;gpt-5.4&lt;/code&gt;. Begin migration with a fresh baseline instead of carrying over every instruction from an older prompt stack. Start with the smallest prompt that preserves the product contract, then tune reasoning effort, verbosity, tool descriptions, and output format against representative examples.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Interesting to see OpenAI recommend starting from scratch rather than trusting that existing prompts optimized for previous models will continue to work effectively with GPT-5.5.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="codex"/><category term="gpt"/></entry><entry><title>llm 0.31</title><link href="https://simonwillison.net/2026/Apr/24/llm/#atom-tag" rel="alternate"/><published>2026-04-24T23:35:07+00:00</published><updated>2026-04-24T23:35:07+00:00</updated><id>https://simonwillison.net/2026/Apr/24/llm/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm/releases/tag/0.31"&gt;llm 0.31&lt;/a&gt;&lt;/p&gt;
        &lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New GPT-5.5 OpenAI model: &lt;code&gt;llm -m gpt-5.5&lt;/code&gt;. &lt;a href="https://github.com/simonw/llm/issues/1418"&gt;#1418&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;New option to set the &lt;a href="https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_new_params_and_tools#1-verbosity-parameter"&gt;text verbosity level&lt;/a&gt; for GPT-5+ OpenAI models: &lt;code&gt;-o verbosity low&lt;/code&gt;. Values are &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;New option for setting the &lt;a href="https://developers.openai.com/api/docs/guides/images-vision#choose-an-image-detail-level"&gt;image detail level&lt;/a&gt; used for image attachments to OpenAI models: &lt;code&gt;-o image_detail low&lt;/code&gt; - values are &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt; and &lt;code&gt;auto&lt;/code&gt;, and GPT-5.4 and 5.5 also accept &lt;code&gt;original&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Models listed in &lt;code&gt;extra-openai-models.yaml&lt;/code&gt; are now also registered as asynchronous. &lt;a href="https://github.com/simonw/llm/issues/1395"&gt;#1395&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="openai"/><category term="llm"/><category term="gpt"/></entry><entry><title>A pelican for GPT-5.5 via the semi-official Codex backdoor API</title><link href="https://simonwillison.net/2026/Apr/23/gpt-5-5/#atom-tag" rel="alternate"/><published>2026-04-23T19:59:47+00:00</published><updated>2026-04-23T19:59:47+00:00</updated><id>https://simonwillison.net/2026/Apr/23/gpt-5-5/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-5/"&gt;GPT-5.5 is out&lt;/a&gt;. It's available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I've had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it's hard to put into words what's good about it - I ask it to build things and it builds exactly what I ask for!&lt;/p&gt;
&lt;p&gt;There's one notable omission from today's release - the API:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;API deployments require different safeguards and we are working closely with partners and customers on the safety and security requirements for serving it at scale. We'll bring GPT‑5.5 and GPT‑5.5 Pro to the API very soon.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When I run my &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;pelican benchmark&lt;/a&gt; I always prefer to use an API, to avoid hidden system prompts in ChatGPT or other agent harnesses from impacting the results.&lt;/p&gt;
&lt;h4 id="the-openclaw-backdoor"&gt;The OpenClaw backdoor&lt;/h4&gt;
&lt;p&gt;One of the ongoing tension points in the AI world over the past few months has concerned how agent harnesses like OpenClaw and Pi interact with the APIs provided by the big providers.&lt;/p&gt;
&lt;p&gt;Both OpenAI and Anthropic offer popular monthly subscriptions which provide access to their models at a significant discount to their raw API.&lt;/p&gt;
&lt;p&gt;OpenClaw integrated directly with this mechanism, and was then &lt;a href="https://www.theverge.com/ai-artificial-intelligence/907074/anthropic-openclaw-claude-subscription-ban"&gt;blocked from doing so&lt;/a&gt; by Anthropic. This kicked off a whole thing. OpenAI - who recently hired OpenClaw creator Peter Steinberger - saw an opportunity for an easy karma win and announced that OpenClaw was welcome to continue integrating with OpenAI's subscriptions via the same mechanism used by their (open source) Codex CLI tool.&lt;/p&gt;
&lt;p&gt;Does this mean &lt;em&gt;anyone&lt;/em&gt; can write code that integrates with OpenAI's Codex-specific APIs to hook into those existing subscriptions?&lt;/p&gt;
&lt;p&gt;The other day &lt;a href="https://twitter.com/jeremyphoward/status/2046537816834965714"&gt;Jeremy Howard asked&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Anyone know whether OpenAI officially supports the use of the &lt;code&gt;/backend-api/codex/responses&lt;/code&gt; endpoint that Pi and Opencode (IIUC) uses?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It turned out that on March 30th OpenAI's Romain Huet &lt;a href="https://twitter.com/romainhuet/status/2038699202834841962"&gt;had tweeted&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We want people to be able to use Codex, and their ChatGPT subscription, wherever they like! That means in the app, in the terminal, but also in JetBrains, Xcode, OpenCode, Pi, and now Claude Code.&lt;/p&gt;
&lt;p&gt;That’s why Codex CLI and Codex app server are open source too! 🙂&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And Peter Steinberger &lt;a href="https://twitter.com/steipete/status/2046775849769148838"&gt;replied to Jeremy&lt;/a&gt; that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenAI sub is officially supported.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="llm-openai-via-codex"&gt;llm-openai-via-codex&lt;/h4&gt;
&lt;p&gt;So... I had Claude Code reverse-engineer the &lt;a href="https://github.com/openai/codex"&gt;openai/codex&lt;/a&gt; repo, figure out how authentication tokens were stored and build me &lt;a href="https://github.com/simonw/llm-openai-via-codex"&gt;llm-openai-via-codex&lt;/a&gt;, a new plugin for &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; which picks up your existing Codex subscription and uses it to run prompts!&lt;/p&gt;
&lt;p&gt;(With hindsight I wish I'd used GPT-5.4 or the GPT-5.5 preview, it would have been funnier. I genuinely considered rewriting the project from scratch using Codex and GPT-5.5 for the sake of the joke, but decided not to spend any more time on this!)&lt;/p&gt;
&lt;p&gt;Here's how to use it:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install Codex CLI, buy an OpenAI plan, login to Codex&lt;/li&gt;
&lt;li&gt;Install LLM: &lt;code&gt;uv tool install llm&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Install the new plugin: &lt;code&gt;llm install llm-openai-via-codex&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Start prompting: &lt;code&gt;llm -m openai-codex/gpt-5.5 'Your prompt goes here'&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;All existing LLM features should also work - use &lt;code&gt;-a filepath.jpg/URL&lt;/code&gt; to attach an image, &lt;code&gt;llm chat -m openai-codex/gpt-5.5&lt;/code&gt; to start an ongoing chat, &lt;code&gt;llm logs&lt;/code&gt; to view logged conversations and &lt;code&gt;llm --tool ...&lt;/code&gt; to &lt;a href="https://llm.datasette.io/en/stable/tools.html"&gt;try it out with tool support&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="and-some-pelicans"&gt;And some pelicans&lt;/h4&gt;
&lt;p&gt;Let's generate a pelican!&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-openai-via-codex
llm -m openai-codex/gpt-5.5 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/edda1d98f7ba07fd95eeff473cb16634"&gt;what I got back&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/gpt-5.5-pelican.png" alt="It is a bit mangled to be honest - good beak, pelican body shapes are slightly weird, legs do at least extend to the pedals, bicycle frame is not quite right." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I've seen better &lt;a href="https://simonwillison.net/2026/Mar/17/mini-and-nano/#pelicans"&gt;from GPT-5.4&lt;/a&gt;, so I tagged on &lt;code&gt;-o reasoning_effort xhigh&lt;/code&gt; and &lt;a href="https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602cc5"&gt;tried again&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;That one took almost four minutes to generate, but I think it's a much better effort.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/gpt-5.5-pelican-xhigh.png" alt="Pelican has gradients now, body is much better put together, bicycle is nearly the right shape albeit with one extra bar between pedals and front wheel, clearly a better image overall." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;If you compare the SVG code (&lt;a href="https://gist.github.com/simonw/edda1d98f7ba07fd95eeff473cb16634#response"&gt;default&lt;/a&gt;, &lt;a href="https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602cc5#response"&gt;xhigh&lt;/a&gt;) the &lt;code&gt;xhigh&lt;/code&gt; one took a very different approach, which is much more CSS-heavy - as demonstrated by those gradients. &lt;code&gt;xhigh&lt;/code&gt; used 9,322 reasoning tokens where the default used just 39.&lt;/p&gt;
&lt;h4 id="a-few-more-notes-on-gpt-5-5"&gt;A few more notes on GPT-5.5&lt;/h4&gt;
&lt;p&gt;One of the most notable things about GPT-5.5 is the pricing. Once it goes live in the API it's &lt;a href="https://openai.com/index/introducing-gpt-5-5/#availability-and-pricing"&gt;going to be priced&lt;/a&gt; at &lt;em&gt;twice&lt;/em&gt; the cost of GPT-5.4 - $5 per 1M input tokens and $30 per 1M output tokens, where 5.4 is $2.5 and $15.&lt;/p&gt;
&lt;p&gt;GPT-5.5 Pro will be even more: $30 per 1M input tokens and $180 per 1M output tokens.&lt;/p&gt;
&lt;p&gt;GPT-5.4 will remain available. At half the price of 5.5 this feels like 5.4 is to 5.5 as Claude Sonnet is to Claude Opus.&lt;/p&gt;
&lt;p&gt;Ethan Mollick has a &lt;a href="https://www.oneusefulthing.org/p/sign-of-the-future-gpt-55"&gt;detailed review of GPT-5.5&lt;/a&gt; where he put it (and GPT-5.5 Pro) through an array of interesting challenges. His verdict: the jagged frontier continues to hold, with GPT-5.5 excellent at some things and challenged by others in a way that remains difficult to predict.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="llm"/><category term="llm-pricing"/><category term="pelican-riding-a-bicycle"/><category term="llm-reasoning"/><category term="llm-release"/><category term="codex"/><category term="gpt"/></entry><entry><title>llm-openai-via-codex 0.1a0</title><link href="https://simonwillison.net/2026/Apr/23/llm-openai-via-codex/#atom-tag" rel="alternate"/><published>2026-04-23T19:22:29+00:00</published><updated>2026-04-23T19:22:29+00:00</updated><id>https://simonwillison.net/2026/Apr/23/llm-openai-via-codex/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm-openai-via-codex/releases/tag/0.1a0"&gt;llm-openai-via-codex 0.1a0&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;Hijacks your Codex CLI credentials to make API calls with LLM, as described &lt;a href="https://simonwillison.net/2026/Apr/23/gpt-5-5/#llm-openai-via-codex"&gt;in my post about GPT-5.5&lt;/a&gt;.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="openai"/><category term="llm"/><category term="codex"/></entry><entry><title>Where's the raccoon with the ham radio? (ChatGPT Images 2.0)</title><link href="https://simonwillison.net/2026/Apr/21/gpt-image-2/#atom-tag" rel="alternate"/><published>2026-04-21T20:32:24+00:00</published><updated>2026-04-21T20:32:24+00:00</updated><id>https://simonwillison.net/2026/Apr/21/gpt-image-2/#atom-tag</id><summary type="html">
    &lt;p&gt;OpenAI &lt;a href="https://openai.com/index/introducing-chatgpt-images-2-0/"&gt;released ChatGPT Images 2.0 today&lt;/a&gt;, their latest image generation model. On &lt;a href="https://www.youtube.com/watch?v=sWkGomJ3TLI"&gt;the livestream&lt;/a&gt; Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5. Here's how I put it to the test.&lt;/p&gt;
&lt;p&gt;My prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Do a where's Waldo style image but it's where is the raccoon holding a ham radio&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="gpt-image-1"&gt;gpt-image-1&lt;/h4&gt;
&lt;p&gt;First as a baseline here's what I got from the older gpt-image-1 using ChatGPT directly:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2026/chatgpt-image-1-ham-radio.png"&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/image_crop_1402x1122_w1402_q0.3.jpg" alt="There's a lot going on, but I couldn't find a raccoon." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I wasn't able to spot the raccoon - I quickly realized that testing image generation models on Where's Waldo style images (Where's Wally in the UK) can be pretty frustrating!&lt;/p&gt;
&lt;p&gt;I tried &lt;a href="https://claude.ai/share/bd6e9b88-29a9-420b-8ac1-3ac5cebac215"&gt;getting Claude Opus 4.7&lt;/a&gt; with its new higher resolution inputs to solve it but it was convinced there was a raccoon it couldn't find thanks to the instruction card at the top left of the image:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Yes — there's at least one raccoon in the picture, but it's very well hidden&lt;/strong&gt;. In my careful sweep through zoomed-in sections, honestly, I couldn't definitively spot a raccoon holding a ham radio. [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="nano-banana-2-and-pro"&gt;Nano Banana 2 and Pro&lt;/h4&gt;
&lt;p&gt;Next I tried Google's Nano Banana 2, &lt;a href="https://gemini.google.com/share/3775db96c576"&gt;via Gemini&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2026/nano-banana-2-ham-radio.jpg"&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/gemini-ham-radio-small.jpg" alt="Busy Where's Waldo-style illustration of a park festival with crowds of people, tents labeled &amp;quot;FOOD &amp;amp; DRINK&amp;quot;, &amp;quot;CRAFT FAIR&amp;quot;, &amp;quot;BOOK NOOK&amp;quot;, &amp;quot;MUSIC FEST&amp;quot;, and &amp;quot;AMATEUR RADIO CLUB - W6HAM&amp;quot; (featuring a raccoon in a red hat at the radio table), plus a Ferris wheel, carousel, gazebo with band, pond with boats, fountain, food trucks, and striped circus tents" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That one was pretty obvious, the raccoon is in the "Amateur Radio Club" booth in the center of the image!&lt;/p&gt;
&lt;p&gt;Claude said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Honestly, this one wasn't really hiding — he's the star of the booth. Feels like the illustrator took pity on us after that last impossible scene. The little "W6HAM" callsign pun on the booth sign is a nice touch too.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I also tried Nano Banana Pro &lt;a href="https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%5B%221sGU5A7mrngkfLfSEU84xaV1DhtOTnS--%22%5D,%22action%22:%22open%22,%22userId%22:%22106366615678321494423%22,%22resourceKeys%22:%7B%7D%7D&amp;amp;usp=sharing"&gt;in AI Studio&lt;/a&gt; and got this, by far the worst result from any model. Not sure what went wrong here!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2026/nano-banana-pro-ham-radio.jpg"&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/nano-banana-pro-ham-radio-small.jpg" alt="The raccoon is larger than everyone else, right in the middle of the image with an ugly white border around it." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="gpt-image-2"&gt;gpt-image-2&lt;/h4&gt;
&lt;p&gt;With the baseline established, let's try out the new model.&lt;/p&gt;
&lt;p&gt;I used an updated version of my &lt;a href="https://github.com/simonw/tools/blob/main/python/openai_image.py"&gt;openai_image.py&lt;/a&gt; script, which is a thin wrapper around the &lt;a href="https://github.com/openai/openai-python"&gt;OpenAI Python&lt;/a&gt; client library. Their client library hasn't yet been updated to include &lt;code&gt;gpt-image-2&lt;/code&gt; but thankfully it doesn't validate the model ID so you can use it anyway.&lt;/p&gt;
&lt;p&gt;Here's how I ran that:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;OPENAI_API_KEY=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;$(&lt;/span&gt;llm keys get openai&lt;span class="pl-pds"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  uv run https://tools.simonwillison.net/python/openai_image.py \
  -m gpt-image-2 \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Do a where's Waldo style image but it's where is the raccoon holding a ham radio&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's what I got back. I don't &lt;em&gt;think&lt;/em&gt; there's a raccoon in there - I couldn't spot one, and neither could Claude.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2026/gpt-image-2-default.png"&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/gpt-image-2-default.jpg" alt="Lots of stuff, a ham radio booth, many many people, a lake, but maybe no raccoon?" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/openai/openai-cookbook/blob/main/examples/multimodal/image-gen-models-prompting-guide.ipynb"&gt;OpenAI image generation cookbook&lt;/a&gt; has been updated with notes on &lt;code&gt;gpt-image-2&lt;/code&gt;, including the &lt;code&gt;outputQuality&lt;/code&gt; setting and available sizes.&lt;/p&gt;
&lt;p&gt;I tried setting &lt;code&gt;outputQuality&lt;/code&gt; to &lt;code&gt;high&lt;/code&gt; and the dimensions to &lt;code&gt;3840x2160&lt;/code&gt; - I believe that's the maximum - and got this - a 17MB PNG which I converted to a 5MB WEBP:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;OPENAI_API_KEY=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;$(&lt;/span&gt;llm keys get openai&lt;span class="pl-pds"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  uv run &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; \
  -m gpt-image-2 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Do a where's Waldo style image but it's where is the raccoon holding a ham radio&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  --quality high --size 3840x2160&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2026/image-fc93bd-q100.webp"&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/image-fc93bd-q100.jpg" alt="Big complex image, lots of detail, good wording, there is indeed a raccoon with a ham radio." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That's pretty great! There's a raccoon with a ham radio in there (bottom left, quite easy to spot).&lt;/p&gt;
&lt;p&gt;The image used 13,342 output tokens, which are charged at $30/million so a total cost of around &lt;a href="https://www.llm-prices.com/#ot=13342&amp;amp;ic=5&amp;amp;cic=1.25&amp;amp;oc=10&amp;amp;sel=gpt-image-2-image"&gt;40 cents&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="takeaways"&gt;Takeaways&lt;/h4&gt;
&lt;p&gt;I think this new ChatGPT image generation model takes the crown from Gemini, at least for the moment.&lt;/p&gt;
&lt;p&gt;Where's Waldo style images are an infuriating and somewhat foolish way to test these models, but they do help illustrate how good they are getting at complex illustrations combining both text and details.&lt;/p&gt;
&lt;h4 id="update-asking-models-to-solve-this-is-risky"&gt;Update: asking models to solve this is risky&lt;/h4&gt;
&lt;p&gt;rizaco &lt;a href="https://news.ycombinator.com/item?id=47852835#47853561"&gt;on Hacker News&lt;/a&gt; asked ChatGPT to draw a red circle around the raccoon in one of the images in which I had failed to find one. Here's an animated mix of their result and the original image:&lt;/p&gt;
&lt;p&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/ham-radio-cheat.gif" alt="The circle appears around a raccoon with a ham radio who is definitely not there in the original image!" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Looks like we definitely can't trust these models to usefully solve their own puzzles!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/text-to-image"&gt;text-to-image&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nano-banana"&gt;nano-banana&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="text-to-image"/><category term="llm-release"/><category term="nano-banana"/></entry><entry><title>Trusted access for the next era of cyber defense</title><link href="https://simonwillison.net/2026/Apr/14/trusted-access-openai/#atom-tag" rel="alternate"/><published>2026-04-14T21:23:59+00:00</published><updated>2026-04-14T21:23:59+00:00</updated><id>https://simonwillison.net/2026/Apr/14/trusted-access-openai/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/scaling-trusted-access-for-cyber-defense/"&gt;Trusted access for the next era of cyber defense&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI's answer to &lt;a href="https://simonwillison.net/2026/Apr/7/project-glasswing/"&gt;Claude Mythos&lt;/a&gt; appears to be a new model called GPT-5.4-Cyber:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In preparation for increasingly more capable models from OpenAI over the next few months, we are fine-tuning our models specifically to enable defensive cybersecurity use cases, starting today with a variant of GPT‑5.4 trained to be cyber-permissive: GPT‑5.4‑Cyber.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They're also extending a program they launched in February (which I had missed) called &lt;a href="https://openai.com/index/trusted-access-for-cyber/"&gt;Trusted Access for Cyber&lt;/a&gt;, where users can verify their identity (via a photo of a government-issued ID processed by &lt;a href="https://withpersona.com/"&gt;Persona&lt;/a&gt;) to gain "reduced friction" access to OpenAI's models for cybersecurity work.&lt;/p&gt;
&lt;p&gt;Honestly, this OpenAI announcement is difficult to follow. Unsurprisingly they don't mention Anthropic at all, but much of the piece emphasizes their many years of existing cybersecurity work and their goal to "democratize access" to these tools, hence the emphasis on that self-service verification flow from February.&lt;/p&gt;
&lt;p&gt;If you want access to their best security tools you still need to go through an extra Google Form application process though, which doesn't feel particularly different to me from Anthropic's &lt;a href="https://www.anthropic.com/glasswing"&gt;Project Glasswing&lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=47770770"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;&lt;/p&gt;



</summary><category term="security"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="ai-security-research"/></entry><entry><title>ChatGPT voice mode is a weaker model</title><link href="https://simonwillison.net/2026/Apr/10/voice-mode-is-weaker/#atom-tag" rel="alternate"/><published>2026-04-10T15:56:02+00:00</published><updated>2026-04-10T15:56:02+00:00</updated><id>https://simonwillison.net/2026/Apr/10/voice-mode-is-weaker/#atom-tag</id><summary type="html">
    &lt;p&gt;I think it's non-obvious to many people that the OpenAI voice mode runs on a much older, much weaker model - it feels like the AI that you can talk to should be the smartest AI but it really isn't.&lt;/p&gt;
&lt;p&gt;If you ask ChatGPT voice mode for its knowledge cutoff date it tells you April 2024 - it's a GPT-4o era model.&lt;/p&gt;
&lt;p&gt;This thought inspired by &lt;a href="https://twitter.com/karpathy/status/2042334451611693415"&gt;this Andrej Karpathy tweet&lt;/a&gt; about the growing gap in understanding of AI capability based on the access points and domains people are using the models with:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and &lt;em&gt;at the same time&lt;/em&gt;, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems.&lt;/p&gt;
&lt;p&gt;This part really works and has made dramatic strides because 2 properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge),  but also&lt;/li&gt;
&lt;li&gt;they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="andrej-karpathy"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/></entry><entry><title>Quoting Chengpeng Mou</title><link href="https://simonwillison.net/2026/Apr/5/chengpeng-mou/#atom-tag" rel="alternate"/><published>2026-04-05T21:47:06+00:00</published><updated>2026-04-05T21:47:06+00:00</updated><id>https://simonwillison.net/2026/Apr/5/chengpeng-mou/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/cpmou2022/status/2040606209800290404"&gt;&lt;p&gt;From anonymized U.S. ChatGPT data, we are seeing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;~2M weekly messages on health insurance&lt;/li&gt;
&lt;li&gt;~600K weekly messages [classified as healthcare] from people living in “hospital deserts” (30 min drive to nearest hospital)&lt;/li&gt;
&lt;li&gt;7 out of 10 msgs happen outside clinic hours&lt;/li&gt;
&lt;/ul&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/cpmou2022/status/2040606209800290404"&gt;Chengpeng Mou&lt;/a&gt;, Head of Business Finance, OpenAI&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="ai-ethics"/></entry><entry><title>Thoughts on OpenAI acquiring Astral and uv/ruff/ty</title><link href="https://simonwillison.net/2026/Mar/19/openai-acquiring-astral/#atom-tag" rel="alternate"/><published>2026-03-19T16:45:15+00:00</published><updated>2026-03-19T16:45:15+00:00</updated><id>https://simonwillison.net/2026/Mar/19/openai-acquiring-astral/#atom-tag</id><summary type="html">
    &lt;p&gt;The big news this morning: &lt;a href="https://astral.sh/blog/openai"&gt;Astral to join OpenAI&lt;/a&gt; (on the Astral blog) and &lt;a href="https://openai.com/index/openai-to-acquire-astral/"&gt;OpenAI to acquire Astral&lt;/a&gt; (the OpenAI announcement). Astral are the company behind &lt;a href="https://simonwillison.net/tags/uv/"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ruff/"&gt;ruff&lt;/a&gt;, and &lt;a href="https://simonwillison.net/tags/ty/"&gt;ty&lt;/a&gt; - three increasingly load-bearing open source projects in the Python ecosystem. I have thoughts!&lt;/p&gt;
&lt;h4 id="the-official-line-from-openai-and-astral"&gt;The official line from OpenAI and Astral&lt;/h4&gt;
&lt;p&gt;The Astral team will become part of the Codex team at OpenAI.&lt;/p&gt;
&lt;p&gt;Charlie Marsh &lt;a href="https://astral.sh/blog/openai"&gt;has this to say&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Open source is at the heart of that impact and the heart of that story; it sits at the center of everything we do. In line with our philosophy and &lt;a href="https://openai.com/index/openai-to-acquire-astral/"&gt;OpenAI's own announcement&lt;/a&gt;, OpenAI will continue supporting our open source tools after the deal closes. We'll keep building in the open, alongside our community -- and for the broader Python ecosystem -- just as we have from the start. [...]&lt;/p&gt;
&lt;p&gt;After joining the Codex team, we'll continue building our open source tools, explore ways they can work more seamlessly with Codex, and expand our reach to think more broadly about the future of software development.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenAI's message &lt;a href="https://openai.com/index/openai-to-acquire-astral/"&gt;has a slightly different focus&lt;/a&gt; (highlights mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;As part of our developer-first philosophy, after closing OpenAI plans to support Astral’s open source products. &lt;strong&gt;By bringing Astral’s tooling and engineering expertise to OpenAI, we will accelerate our work on Codex&lt;/strong&gt; and expand what AI can do across the software development lifecycle.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a slightly confusing message. The &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt; is a Rust application, and Astral have some of the best Rust engineers in the industry - &lt;a href="https://github.com/burntsushi"&gt;BurntSushi&lt;/a&gt; alone (&lt;a href="https://github.com/rust-lang/regex"&gt;Rust regex&lt;/a&gt;, &lt;a href="https://github.com/BurntSushi/ripgrep"&gt;ripgrep&lt;/a&gt;, &lt;a href="https://github.com/BurntSushi/jiff"&gt;jiff&lt;/a&gt;) may be worth the price of acquisition!&lt;/p&gt;
&lt;p&gt;So is this about the talent or about the product? I expect both, but I know from past experience that a product+talent acquisition can turn into a talent-only acquisition later on.&lt;/p&gt;
&lt;h4 id="uv-is-the-big-one"&gt;uv is the big one&lt;/h4&gt;
&lt;p&gt;Of Astral's projects the most impactful is &lt;a href="https://github.com/astral-sh/uv"&gt;uv&lt;/a&gt;. If you're not familiar with it, &lt;code&gt;uv&lt;/code&gt; is by far the most convincing solution to Python's environment management problems, best illustrated by &lt;a href="https://xkcd.com/1987/"&gt;this classic XKCD&lt;/a&gt;:&lt;/p&gt;
&lt;p style="text-align: center"&gt;&lt;img src="https://imgs.xkcd.com/comics/python_environment.png" alt="xkcd comic showing a tangled, chaotic flowchart of Python environment paths and installations. Nodes include &amp;quot;PIP&amp;quot;, &amp;quot;EASY_INSTALL&amp;quot;, &amp;quot;$PYTHONPATH&amp;quot;, &amp;quot;ANACONDA PYTHON&amp;quot;, &amp;quot;ANOTHER PIP??&amp;quot;, &amp;quot;HOMEBREW PYTHON (2.7)&amp;quot;, &amp;quot;OS PYTHON&amp;quot;, &amp;quot;HOMEBREW PYTHON (3.6)&amp;quot;, &amp;quot;PYTHON.ORG BINARY (2.6)&amp;quot;, and &amp;quot;(MISC FOLDERS OWNED BY ROOT)&amp;quot; connected by a mess of overlapping arrows. A stick figure with a &amp;quot;?&amp;quot; stands at the top left. Paths at the bottom include &amp;quot;/usr/local/Cellar&amp;quot;, &amp;quot;/usr/local/opt&amp;quot;, &amp;quot;/usr/local/lib/python3.6&amp;quot;, &amp;quot;/usr/local/lib/python2.7&amp;quot;, &amp;quot;/python/&amp;quot;, &amp;quot;/newenv/&amp;quot;, &amp;quot;$PATH&amp;quot;, &amp;quot;????&amp;quot;, and &amp;quot;/(A BUNCH OF PATHS WITH &amp;quot;FRAMEWORKS&amp;quot; IN THEM SOMEWHERE)/&amp;quot;. Caption reads: &amp;quot;MY PYTHON ENVIRONMENT HAS BECOME SO DEGRADED THAT MY LAPTOP HAS BEEN DECLARED A SUPERFUND SITE.&amp;quot;" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Switch from &lt;code&gt;python&lt;/code&gt; to &lt;code&gt;uv run&lt;/code&gt; and most of these problems go away. I've been using it extensively for the past couple of years and it's become an essential part of my workflow.&lt;/p&gt;
&lt;p&gt;I'm not alone in this. According to PyPI Stats &lt;a href="https://pypistats.org/packages/uv"&gt;uv was downloaded&lt;/a&gt; more than 126 million times last month! Since its release in February 2024 - just two years ago - it's become one of the most popular tools for running Python code.&lt;/p&gt;
&lt;h4 id="ruff-and-ty"&gt;Ruff and ty&lt;/h4&gt;
&lt;p&gt;Astral's two other big projects are &lt;a href="https://github.com/astral-sh/ruff"&gt;ruff&lt;/a&gt; - a Python linter and formatter - and &lt;a href="https://github.com/astral-sh/ty"&gt;ty&lt;/a&gt; - a fast Python type checker.&lt;/p&gt;
&lt;p&gt;These are popular tools that provide a great developer experience but they aren't load-bearing in the same way that &lt;code&gt;uv&lt;/code&gt; is.&lt;/p&gt;
&lt;p&gt;They do however resonate well with coding agent tools like Codex - giving an agent access to fast linting and type checking tools can help improve the quality of the code they generate.&lt;/p&gt;
&lt;p&gt;I'm not convinced that integrating them &lt;em&gt;into&lt;/em&gt; the coding agent itself as opposed to telling it when to run them will make a meaningful difference, but I may just not be imaginative enough here.&lt;/p&gt;
&lt;h4 id="what-of-pyx-"&gt;What of pyx?&lt;/h4&gt;
&lt;p&gt;Ever since &lt;code&gt;uv&lt;/code&gt; started to gain traction the Python community has been worrying about the strategic risk of a single VC-backed company owning a key piece of Python infrastructure. I &lt;a href="https://simonwillison.net/2024/Sep/8/uv-under-discussion-on-mastodon/"&gt;wrote about&lt;/a&gt; one of those conversations in detail back in September 2024.&lt;/p&gt;
&lt;p&gt;The conversation back then focused on what Astral's business plan could be, which started to take form &lt;a href="https://simonwillison.net/2025/Aug/13/pyx/"&gt;in August 2025&lt;/a&gt; when they announced &lt;a href="https://astral.sh/pyx"&gt;pyx&lt;/a&gt;, their private PyPI-style package registry for organizations.&lt;/p&gt;
&lt;p&gt;I'm less convinced that pyx makes sense within OpenAI, and it's notably absent from both the Astral and OpenAI announcement posts.&lt;/p&gt;
&lt;h4 id="competitive-dynamics"&gt;Competitive dynamics&lt;/h4&gt;
&lt;p&gt;An interesting aspect of this deal is how it might impact the competition between Anthropic and OpenAI.&lt;/p&gt;
&lt;p&gt;Both companies spent most of 2025 focused on improving the coding ability of their models, resulting in the &lt;a href="https://simonwillison.net/tags/november-2025-inflection/"&gt;November 2025 inflection point&lt;/a&gt; when coding agents went from often-useful to almost-indispensable tools for software development.&lt;/p&gt;
&lt;p&gt;The competition between Anthropic's Claude Code and OpenAI's Codex is &lt;em&gt;fierce&lt;/em&gt;. Those $200/month subscriptions add up to billions of dollars a year in revenue, for companies that very much need that money.&lt;/p&gt;
&lt;p&gt;Anthropic &lt;a href="https://www.anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone"&gt;acquired the Bun JavaScript runtime&lt;/a&gt; in December 2025, an acquisition that looks somewhat similar in shape to Astral.&lt;/p&gt;
&lt;p&gt;Bun was already a core component of Claude Code and that acquisition looked to mainly be about ensuring that a crucial dependency stayed actively maintained. Claude Code's performance has increased significantly since then thanks to the efforts of Bun's Jarred Sumner.&lt;/p&gt;
&lt;p&gt;One bad version of this deal would be if OpenAI start using their ownership of &lt;code&gt;uv&lt;/code&gt; as leverage in their competition with Anthropic.&lt;/p&gt;
&lt;h4 id="astral-s-quiet-series-a-and-b"&gt;Astral's quiet series A and B&lt;/h4&gt;
&lt;p&gt;One detail that caught my eye from Astral's announcement, in the section thanking the team, investors, and community:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Second, to our investors, especially &lt;a href="https://www.accel.com/team/casey-aylward#bay-area"&gt;Casey Aylward&lt;/a&gt; from Accel, who led our Seed and Series A, and &lt;a href="https://a16z.com/author/jennifer-li/"&gt;Jennifer Li&lt;/a&gt; from Andreessen Horowitz, who led our Series B. As a first-time, technical, solo founder, you showed far more belief in me than I ever showed in myself, and I will never forget that.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As far as I can tell neither the Series A nor the Series B were previously announced - I've only been able to find coverage of the original seed round &lt;a href="https://astral.sh/blog/announcing-astral-the-company-behind-ruff"&gt;from April 2023&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Those investors presumably now get to exchange their stake in Astral for a piece of OpenAI. I wonder how much influence they had on Astral's decision to sell.&lt;/p&gt;
&lt;h4 id="forking-as-a-credible-exit-"&gt;Forking as a credible exit?&lt;/h4&gt;
&lt;p&gt;Armin Ronacher built &lt;a href="https://til.simonwillison.net/python/rye"&gt;Rye&lt;/a&gt;, which was later taken over by Astral and effectively merged with uv. In &lt;a href="https://lucumr.pocoo.org/2024/8/21/harvest-season/"&gt;August 2024&lt;/a&gt; he wrote about the risk involved in a VC-backed company owning a key piece of open source infrastructure and said the following (highlight mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;However having seen the code and what uv is doing, &lt;strong&gt;even in the worst possible future this is a very forkable and maintainable thing&lt;/strong&gt;. I believe that even in case Astral shuts down or were to do something incredibly dodgy licensing wise, the community would be better off than before uv existed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Astral's own Douglas Creager &lt;a href="https://news.ycombinator.com/item?id=47438723#47439974"&gt;emphasized this angle on Hacker News today&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;All I can say is that &lt;em&gt;right now&lt;/em&gt;, we're committed to maintaining our open-source tools with the same level of effort, care, and attention to detail as before. That does not change with this acquisition. No one can guarantee how motives, incentives, and decisions might change years down the line. But that's why we bake optionality into it with the tools being permissively licensed. That makes the worst-case scenarios have the shape of "fork and move on", and not "software disappears forever".&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I like and trust the Astral team and I'm optimistic that their projects will be well-maintained in their new home.&lt;/p&gt;
&lt;p&gt;OpenAI don't yet have much of a track record with respect to acquiring and maintaining open source projects. They've been on a bit of an acquisition spree over the past three months though, snapping up &lt;a href="https://openai.com/index/openai-to-acquire-promptfoo/"&gt;Promptfoo&lt;/a&gt; and &lt;a href="https://steipete.me/posts/2026/openclaw"&gt;OpenClaw&lt;/a&gt; (sort-of, they hired creator Peter Steinberger and are spinning OpenClaw off to a foundation), plus closed source LaTeX platform &lt;a href="https://openai.com/index/introducing-prism/"&gt;Crixet (now Prism)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If things do go south for &lt;code&gt;uv&lt;/code&gt; and the other Astral projects we'll get to see how credible the forking exit strategy turns out to be.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ruff"&gt;ruff&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/astral"&gt;astral&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/charlie-marsh"&gt;charlie-marsh&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ty"&gt;ty&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="python"/><category term="ai"/><category term="rust"/><category term="openai"/><category term="ruff"/><category term="uv"/><category term="astral"/><category term="charlie-marsh"/><category term="coding-agents"/><category term="codex"/><category term="ty"/></entry><entry><title>GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52</title><link href="https://simonwillison.net/2026/Mar/17/mini-and-nano/#atom-tag" rel="alternate"/><published>2026-03-17T19:39:17+00:00</published><updated>2026-03-17T19:39:17+00:00</updated><id>https://simonwillison.net/2026/Mar/17/mini-and-nano/#atom-tag</id><summary type="html">
    &lt;p&gt;OpenAI today: &lt;a href="https://openai.com/index/introducing-gpt-5-4-mini-and-nano/"&gt;Introducing GPT‑5.4 mini and nano&lt;/a&gt;. These models join GPT-5.4 which was released &lt;a href="https://openai.com/index/introducing-gpt-5-4/"&gt;two weeks ago&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;OpenAI's self-reported benchmarks show the new 5.4-nano out-performing their previous GPT-5 mini model when run at maximum reasoning effort. The new mini is also 2x faster than the previous mini.&lt;/p&gt;
&lt;p&gt;Here's how the pricing looks - all prices are per million tokens. &lt;code&gt;gpt-5.4-nano&lt;/code&gt; is notably even cheaper than Google's Gemini 3.1 Flash-Lite:&lt;/p&gt;
&lt;center&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Model&lt;/th&gt;
      &lt;th&gt;Input&lt;/th&gt;
      &lt;th&gt;Cached input&lt;/th&gt;
      &lt;th&gt;Output&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;gpt-5.4&lt;/td&gt;
      &lt;td&gt;$2.50&lt;/td&gt;
      &lt;td&gt;$0.25&lt;/td&gt;
      &lt;td&gt;$15.00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;gpt-5.4-mini&lt;/td&gt;
      &lt;td&gt;$0.75&lt;/td&gt;
      &lt;td&gt;$0.075&lt;/td&gt;
      &lt;td&gt;$4.50&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;gpt-5.4-nano&lt;/td&gt;
      &lt;td&gt;$0.20&lt;/td&gt;
      &lt;td&gt;$0.02&lt;/td&gt;
      &lt;td&gt;$1.25&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;&lt;td colspan="4"&gt;&lt;center&gt;Other models for comparison&lt;/center&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Claude Opus 4.6&lt;/td&gt;
      &lt;td&gt;$5.00&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;$25.00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
      &lt;td&gt;$3.00&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;$15.00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
      &lt;td&gt;$2.00&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;$12.00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Claude Haiku 4.5&lt;/td&gt;
      &lt;td&gt;$1.00&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;$5.00&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr&gt;
      &lt;td&gt;Gemini 3.1 Flash-Lite&lt;/td&gt;
      &lt;td&gt;$0.25&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;$1.50&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/center&gt;
&lt;p&gt;I used GPT-5.4 nano to generate a description of this photo I took at the &lt;a href="https://www.niche-museums.com/118"&gt;John M. Mossman Lock Collection&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/IMG_2324.jpeg" alt="Description below" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m gpt-5.4-nano -a IMG_2324.jpeg 'describe image'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's the output:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The image shows the interior of a museum gallery with a long display wall. White-painted brick walls are covered with many framed portraits arranged in neat rows. Below the portraits, there are multiple glass display cases with dark wooden frames and glass tops/fronts, containing various old historical objects and equipment. The room has a polished wooden floor, hanging ceiling light fixtures/cords, and a few visible pipes near the top of the wall. In the foreground, glass cases run along the length of the room, reflecting items from other sections of the gallery.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That took 2,751 input tokens and 112 output tokens, at a cost of &lt;a href="https://www.llm-prices.com/#it=2751&amp;amp;ot=112&amp;amp;sel=gpt-5.4-nano"&gt;0.069 cents&lt;/a&gt; (less than a tenth of a cent). That means describing every single photo in my 76,000 photo collection would cost around $52.44.&lt;/p&gt;
&lt;p&gt;I released &lt;a href="https://llm.datasette.io/en/stable/changelog.html#v0-29"&gt;llm 0.29&lt;/a&gt; with support for the new models.&lt;/p&gt;
&lt;h4 id="pelicans"&gt;Pelicans&lt;/h4&gt;
&lt;p&gt;Then I had OpenAI Codex loop through all five reasoning effort levels and all three models and produce this combined SVG grid of pelicans riding bicycles (&lt;a href="https://gist.github.com/simonw/f16292d9a5b90b28054cff3ba497a3ca"&gt;generation transcripts here&lt;/a&gt;). I do like the gpt-5.4 xhigh one the best, it has a good bicycle (with nice spokes) and the pelican has a fish in its beak!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/gpt-5.4-pelican-family.svg" alt="Described by Claude Opus 4.6: A 5x3 comparison grid of AI-generated cartoon illustrations of a pelican riding a bicycle. Columns are labeled &amp;quot;gpt-5.4-nano&amp;quot;, &amp;quot;gpt-5.4-mini&amp;quot;, and &amp;quot;gpt-5.4&amp;quot; across the top, and rows are labeled &amp;quot;none&amp;quot;, &amp;quot;low&amp;quot;, &amp;quot;medium&amp;quot;, &amp;quot;high&amp;quot;, and &amp;quot;xhigh&amp;quot; down the left side, representing quality/detail settings. In the &amp;quot;none&amp;quot; row, gpt-5.4-nano shows a chaotic white bird with misplaced arrows and tangled wheels on grass, gpt-5.4-mini shows a duck-like brown bird awkwardly straddling a motorcycle-like bike, and gpt-5.4 shows a stiff gray-and-white pelican sitting atop a blue tandem bicycle with extra legs. In the &amp;quot;low&amp;quot; row, nano shows a chubby round white bird pedaling with small feet on grass, mini shows a cleaner white bird riding a blue bicycle with motion lines, and gpt-5.4 shows a pelican with a blue cap riding confidently but with slightly awkward proportions. In the &amp;quot;medium&amp;quot; row, nano regresses to a strange bird standing over bowling balls on ice, mini shows two plump white birds merged onto one yellow-wheeled bicycle, and gpt-5.4 shows a more recognizable gray-and-white pelican on a red bicycle but with tangled extra legs. In the &amp;quot;high&amp;quot; row, nano shows multiple small pelicans crowded around a broken green bicycle on grass with a sun overhead, mini shows a tandem bicycle with two white pelicans and clear blue sky, and gpt-5.4 shows two pelicans stacked on a red tandem bike with the most realistic proportions yet. In the &amp;quot;xhigh&amp;quot; row, nano shows the most detailed scene with a pelican on a detailed bicycle with grass and a large sun but still somewhat jumbled anatomy, mini produces the cleanest single pelican on a yellow-accented bicycle with a light blue sky, and gpt-5.4 shows a well-rendered gray pelican on a teal bicycle with the best overall coherence. Generally, quality improves moving right across models and down through quality tiers, though &amp;quot;medium&amp;quot; is inconsistently worse than &amp;quot;low&amp;quot; for some models, and all images maintain a lighthearted cartoon style with pastel skies and simple backgrounds." style="max-width: 100%;" /&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="vision-llms"/><category term="llm-pricing"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/></entry><entry><title>Use subagents and custom agents in Codex</title><link href="https://simonwillison.net/2026/Mar/16/codex-subagents/#atom-tag" rel="alternate"/><published>2026-03-16T23:03:56+00:00</published><updated>2026-03-16T23:03:56+00:00</updated><id>https://simonwillison.net/2026/Mar/16/codex-subagents/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://developers.openai.com/codex/subagents"&gt;Use subagents and custom agents in Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Subagents were announced in general availability today for OpenAI Codex, after several weeks of preview behind a feature flag.&lt;/p&gt;
&lt;p&gt;They're very similar to the Claude Code implementation, with default subagents for "explorer", "worker" and "default". It's unclear to me what the difference between "worker" and "default" is but based on their CSV example I think "worker" is intended for running large numbers of small tasks in parallel.&lt;/p&gt;
&lt;p&gt;Codex also lets you define custom agents as TOML files in &lt;code&gt;~/.codex/agents/&lt;/code&gt;. These can have custom instructions and be assigned to use specific models - including &lt;code&gt;gpt-5.3-codex-spark&lt;/code&gt; if you want &lt;a href="https://simonwillison.net/2026/Feb/12/codex-spark/"&gt;some raw speed&lt;/a&gt;. They can then be referenced by name, as demonstrated by this example prompt from the documentation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Investigate why the settings modal fails to save. Have browser_debugger reproduce it, code_mapper trace the responsible code path, and ui_fixer implement the smallest fix once the failure mode is clear.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The subagents pattern is widely supported in coding agents now. Here's documentation across a number of different platforms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/subagents/"&gt;OpenAI Codex subagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/sub-agents"&gt;Claude Code subagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://geminicli.com/docs/core/subagents/"&gt;Gemini CLI subagents&lt;/a&gt; (experimental)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.mistral.ai/mistral-vibe/agents-skills#agent-selection"&gt;Mistral Vibe subagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opencode.ai/docs/agents/"&gt;OpenCode agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.visualstudio.com/docs/copilot/agents/subagents"&gt;Subagents in Visual Studio Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cursor.com/docs/subagents"&gt;Cursor Subagents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I added &lt;a href="https://simonwillison.net/guides/agentic-engineering-patterns/subagents/"&gt;a chapter on Subagents&lt;/a&gt; to my Agentic Engineering Patterns guide.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/OpenAIDevs/status/2033636701848174967"&gt;@OpenAIDevs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="coding-agents"/><category term="codex"/><category term="parallel-agents"/><category term="agentic-engineering"/></entry><entry><title>Codex for Open Source</title><link href="https://simonwillison.net/2026/Mar/7/codex-for-open-source/#atom-tag" rel="alternate"/><published>2026-03-07T18:13:39+00:00</published><updated>2026-03-07T18:13:39+00:00</updated><id>https://simonwillison.net/2026/Mar/7/codex-for-open-source/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://developers.openai.com/codex/community/codex-for-oss"&gt;Codex for Open Source&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Anthropic announced six months of free Claude Max for maintainers of popular open source projects (5,000+ stars or 1M+ NPM downloads) &lt;a href="https://simonwillison.net/2026/Feb/27/claude-max-oss-six-months/"&gt;on 27th February&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now OpenAI have launched their comparable offer: six months of ChatGPT Pro (same $200/month price as Claude Max) with Codex and "conditional access to Codex Security" for core maintainers.&lt;/p&gt;
&lt;p&gt;Unlike Anthropic they don't hint at the exact metrics they care about, but the &lt;a href="https://openai.com/form/codex-for-oss/"&gt;application form&lt;/a&gt; does ask for "information such as GitHub stars, monthly downloads, or why the project is important to the ecosystem."

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/openaidevs/status/2029998191043911955"&gt;@openaidevs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="codex"/></entry><entry><title>Anthropic and the Pentagon</title><link href="https://simonwillison.net/2026/Mar/6/anthropic-and-the-pentagon/#atom-tag" rel="alternate"/><published>2026-03-06T17:26:50+00:00</published><updated>2026-03-06T17:26:50+00:00</updated><id>https://simonwillison.net/2026/Mar/6/anthropic-and-the-pentagon/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.schneier.com/blog/archives/2026/03/anthropic-and-the-pentagon.html"&gt;Anthropic and the Pentagon&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This piece by Bruce Schneier and Nathan E. Sanders is the most thoughtful and grounded coverage I've seen of the recent and ongoing Pentagon/OpenAI/Anthropic contract situation.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI models are increasingly commodified. The top-tier offerings have about the same performance, and there is little to differentiate one from the other. The latest models from Anthropic, OpenAI and Google, in particular, tend to leapfrog each other with minor hops forward in quality every few months. [...]&lt;/p&gt;
&lt;p&gt;In this sort of market, branding matters a lot. Anthropic and its CEO, Dario Amodei, are positioning themselves as the moral and trustworthy AI provider. That has market value for both consumers and enterprise clients.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bruce-schneier"&gt;bruce-schneier&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;&lt;/p&gt;



</summary><category term="bruce-schneier"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="ai-ethics"/></entry><entry><title>Introducing GPT‑5.4</title><link href="https://simonwillison.net/2026/Mar/5/introducing-gpt54/#atom-tag" rel="alternate"/><published>2026-03-05T23:56:09+00:00</published><updated>2026-03-05T23:56:09+00:00</updated><id>https://simonwillison.net/2026/Mar/5/introducing-gpt54/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-4/"&gt;Introducing GPT‑5.4&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Two new API models: &lt;a href="https://developers.openai.com/api/docs/models/gpt-5.4"&gt;gpt-5.4&lt;/a&gt; and &lt;a href="https://developers.openai.com/api/docs/models/gpt-5.4-pro"&gt;gpt-5.4-pro&lt;/a&gt;, also available in ChatGPT and Codex CLI. August 31st 2025 knowledge cutoff, 1 million token context window. Priced &lt;a href="https://www.llm-prices.com/#sel=gpt-5.2%2Cgpt-5.2-pro%2Cgpt-5.4%2Cgpt-5.4-272k%2Cgpt-5.4-pro%2Cgpt-5.4-pro-272k"&gt;slightly higher&lt;/a&gt; than the GPT-5.2 family with a bump in price for both models if you go above 272,000 tokens.&lt;/p&gt;
&lt;p&gt;5.4 beats coding specialist GPT-5.3-Codex on all of the relevant benchmarks. I wonder if we'll get a 5.4 Codex or if that model line has now been merged into main?&lt;/p&gt;
&lt;p&gt;Given Claude's recent focus on business applications it's interesting to see OpenAI highlight this in their announcement of GPT-5.4:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We put a particular focus on improving GPT‑5.4’s ability to create and edit spreadsheets, presentations, and documents. On an internal benchmark of spreadsheet modeling tasks that a junior investment banking analyst might do, GPT‑5.4 achieves a mean score of &lt;strong&gt;87.3%&lt;/strong&gt;, compared to &lt;strong&gt;68.4%&lt;/strong&gt; for GPT‑5.2.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's a pelican on a bicycle &lt;a href="https://gist.github.com/simonw/7fe75b8dab6ec9c2b6bd8fd1a5a640a6"&gt;drawn by GPT-5.4&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="alt text by GPT-5.4: Illustration of a cartoon pelican riding a bicycle, with a light gray background, dark blue bike frame and wheels, orange beak and legs, and motion lines suggesting movement." src="https://static.simonwillison.net/static/2026/gpt-5.4-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;And &lt;a href="https://gist.github.com/simonw/688c0d5d93a5539b93d3f549a0b733ad"&gt;here's one&lt;/a&gt; by GPT-5.4 Pro, which took 4m45s and cost me &lt;a href="https://www.llm-prices.com/#it=16&amp;amp;ot=8593&amp;amp;sel=gpt-5.4-pro"&gt;$1.55&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Described by GPT-5.4: Illustration of a cartoon pelican riding a blue bicycle on pale green grass against a light gray background, with a large orange beak, gray-and-white body, and orange legs posed on the pedals." src="https://static.simonwillison.net/static/2026/gpt-5.4-pro-pelican.png" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/></entry><entry><title>Quoting Benedict Evans</title><link href="https://simonwillison.net/2026/Feb/26/benedict-evans/#atom-tag" rel="alternate"/><published>2026-02-26T03:44:56+00:00</published><updated>2026-02-26T03:44:56+00:00</updated><id>https://simonwillison.net/2026/Feb/26/benedict-evans/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.ben-evans.com/benedictevans/2026/2/19/how-will-openai-compete-nkg2x"&gt;&lt;p&gt;If people are only using this a couple of times a week at most, and can’t think of anything to do with it on the average day, it hasn’t changed their life. OpenAI itself admits the problem, talking about a ‘capability gap’ between what the models can do and what people do with them, which seems to me like a way to avoid saying that you don’t have clear product-market fit. &lt;/p&gt;
&lt;p&gt;Hence, OpenAI’s ad project is partly just about covering the cost of serving the 90% or more of users who don’t pay (and capturing an early lead with advertisers and early learning in how this might work), but more strategically, it’s also about making it possible to give those users the latest and most powerful (i.e. expensive) models, in the hope that this will deepen their engagement.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.ben-evans.com/benedictevans/2026/2/19/how-will-openai-compete-nkg2x"&gt;Benedict Evans&lt;/a&gt;, How will OpenAI compete?&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/benedict-evans"&gt;benedict-evans&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="chatgpt"/><category term="benedict-evans"/></entry><entry><title>How I think about Codex</title><link href="https://simonwillison.net/2026/Feb/22/how-i-think-about-codex/#atom-tag" rel="alternate"/><published>2026-02-22T15:53:43+00:00</published><updated>2026-02-22T15:53:43+00:00</updated><id>https://simonwillison.net/2026/Feb/22/how-i-think-about-codex/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.linkedin.com/pulse/how-i-think-codex-gabriel-chua-ukhic"&gt;How I think about Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Gabriel Chua (Developer Experience Engineer for APAC at OpenAI) provides his take on the confusing terminology behind the term "Codex", which can refer to a bunch of of different things within the OpenAI ecosystem:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In plain terms, Codex is OpenAI’s software engineering agent, available through multiple interfaces, and an agent is a model plus instructions and tools, wrapped in a runtime that can execute tasks on your behalf. [...]&lt;/p&gt;
&lt;p&gt;At a high level, I see Codex as three parts working together:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Codex = Model + Harness + Surfaces&lt;/em&gt; [...]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model + Harness = the Agent&lt;/li&gt;
&lt;li&gt;Surfaces = how you interact with the Agent&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;He defines the harness as "the collection of instructions and tools", which is notably open source and lives in the &lt;a href="https://github.com/openai/codex"&gt;openai/codex&lt;/a&gt; repository.&lt;/p&gt;
&lt;p&gt;Gabriel also provides the first acknowledgment I've seen from an OpenAI insider that the Codex model family are directly trained for the Codex harness:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Codex models are trained in the presence of the harness. Tool use, execution loops, compaction, and iterative verification aren’t bolted on behaviors — they’re part of how the model learns to operate. The harness, in turn, is shaped around how the model plans, invokes tools, and recovers from failure.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="codex"/></entry><entry><title>Quoting Thibault Sottiaux</title><link href="https://simonwillison.net/2026/Feb/21/thibault-sottiaux/#atom-tag" rel="alternate"/><published>2026-02-21T01:30:21+00:00</published><updated>2026-02-21T01:30:21+00:00</updated><id>https://simonwillison.net/2026/Feb/21/thibault-sottiaux/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/thsottiaux/status/2024947946849186064"&gt;&lt;p&gt;We’ve made GPT-5.3-Codex-Spark about 30% faster. It is now serving at over 1200 tokens per second.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/thsottiaux/status/2024947946849186064"&gt;Thibault Sottiaux&lt;/a&gt;, OpenAI&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-performance"&gt;llm-performance&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="llm-performance"/></entry><entry><title>SWE-bench February 2026 leaderboard update</title><link href="https://simonwillison.net/2026/Feb/19/swe-bench/#atom-tag" rel="alternate"/><published>2026-02-19T04:48:47+00:00</published><updated>2026-02-19T04:48:47+00:00</updated><id>https://simonwillison.net/2026/Feb/19/swe-bench/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.swebench.com/"&gt;SWE-bench February 2026 leaderboard update&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
SWE-bench is one of the benchmarks that the labs love to list in their model releases. The official leaderboard is infrequently updated but they just did a full run of it against the current generation of models, which is notable because it's always good to see benchmark results like this that &lt;em&gt;weren't&lt;/em&gt; self-reported by the labs.&lt;/p&gt;
&lt;p&gt;The fresh results are for their "Bash Only" benchmark, which runs their &lt;a href="https://github.com/SWE-agent/mini-swe-agent"&gt;mini-swe-bench&lt;/a&gt; agent (~9,000 lines of Python, &lt;a href="https://github.com/SWE-agent/mini-swe-agent/blob/v2.2.1/src/minisweagent/config/benchmarks/swebench.yaml"&gt;here are the prompts&lt;/a&gt; they use) against the &lt;a href="https://huggingface.co/datasets/princeton-nlp/SWE-bench"&gt;SWE-bench&lt;/a&gt; dataset of coding problems - 2,294 real-world examples pulled from 12 open source repos: &lt;a href="https://github.com/django/django"&gt;django/django&lt;/a&gt; (850), &lt;a href="https://github.com/sympy/sympy"&gt;sympy/sympy&lt;/a&gt; (386), &lt;a href="https://github.com/scikit-learn/scikit-learn"&gt;scikit-learn/scikit-learn&lt;/a&gt; (229), &lt;a href="https://github.com/sphinx-doc/sphinx"&gt;sphinx-doc/sphinx&lt;/a&gt; (187), &lt;a href="https://github.com/matplotlib/matplotlib"&gt;matplotlib/matplotlib&lt;/a&gt; (184), &lt;a href="https://github.com/pytest-dev/pytest"&gt;pytest-dev/pytest&lt;/a&gt; (119), &lt;a href="https://github.com/pydata/xarray"&gt;pydata/xarray&lt;/a&gt; (110), &lt;a href="https://github.com/astropy/astropy"&gt;astropy/astropy&lt;/a&gt; (95), &lt;a href="https://github.com/pylint-dev/pylint"&gt;pylint-dev/pylint&lt;/a&gt; (57), &lt;a href="https://github.com/psf/requests"&gt;psf/requests&lt;/a&gt; (44), &lt;a href="https://github.com/mwaskom/seaborn"&gt;mwaskom/seaborn&lt;/a&gt; (22), &lt;a href="https://github.com/pallets/flask"&gt;pallets/flask&lt;/a&gt; (11).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Correction&lt;/strong&gt;: &lt;em&gt;The Bash only benchmark runs against SWE-bench Verified, not original SWE-bench. Verified is a manually curated subset of 500 samples &lt;a href="https://openai.com/index/introducing-swe-bench-verified/"&gt;described here&lt;/a&gt;, funded by OpenAI. Here's &lt;a href="https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified"&gt;SWE-bench Verified&lt;/a&gt; on Hugging Face - since it's just 2.1MB of Parquet it's easy to browse &lt;a href="https://lite.datasette.io/?parquet=https%3A%2F%2Fhuggingface.co%2Fdatasets%2Fprinceton-nlp%2FSWE-bench_Verified%2Fresolve%2Fmain%2Fdata%2Ftest-00000-of-00001.parquet#/data/test-00000-of-00001?_facet=repo"&gt;using Datasette Lite&lt;/a&gt;, which cuts those numbers down to django/django (231), sympy/sympy (75), sphinx-doc/sphinx (44), matplotlib/matplotlib (34), scikit-learn/scikit-learn (32), astropy/astropy (22), pydata/xarray (22), pytest-dev/pytest (19), pylint-dev/pylint (10), psf/requests (8), mwaskom/seaborn (2), pallets/flask (1)&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Here's how the top ten models performed:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Bar chart showing &amp;quot;% Resolved&amp;quot; by &amp;quot;Model&amp;quot;. Bars in descending order: Claude 4.5 Opus (high reasoning) 76.8%, Gemini 3 Flash (high reasoning) 75.8%, MiniMax M2.5 (high reasoning) 75.8%, Claude Opus 4.6 75.6%, GLM-5 (high reasoning) 72.8%, GPT-5.2 (high reasoning) 72.8%, Claude 4.5 Sonnet (high reasoning) 72.8%, Kimi K2.5 (high reasoning) 71.4%, DeepSeek V3.2 (high reasoning) 70.8%, Claude 4.5 Haiku (high reasoning) 70.0%, and a partially visible final bar at 66.6%." src="https://static.simonwillison.net/static/2026/swbench-feb-2026.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;It's interesting to see Claude Opus 4.5 beat Opus 4.6, though only by about a percentage point. 4.5 Opus is top, then Gemini 3 Flash, then MiniMax M2.5 - a 229B model released &lt;a href="https://www.minimax.io/news/minimax-m25"&gt;last week&lt;/a&gt; by Chinese lab MiniMax. GLM-5, Kimi K2.5 and DeepSeek V3.2 are three more Chinese models that make the top ten as well.&lt;/p&gt;
&lt;p&gt;OpenAI's GPT-5.2 is their highest performing model at position 6, but it's worth noting that their best coding model, GPT-5.3-Codex, is not represented - maybe because it's not yet available in the OpenAI API.&lt;/p&gt;
&lt;p&gt;This benchmark uses the same system prompt for every model, which is important for a fair comparison but does mean that the quality of the different harnesses or optimized prompts is not being measured here.&lt;/p&gt;
&lt;p&gt;The chart above is a screenshot from the SWE-bench website, but their charts don't include the actual percentage values visible on the bars. I successfully used Claude for Chrome to add these - &lt;a href="https://claude.ai/share/81a0c519-c727-4caa-b0d4-0d866375d0da"&gt;transcript here&lt;/a&gt;. My prompt sequence included:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Use claude in chrome to open https://www.swebench.com/&lt;/p&gt;
&lt;p&gt;Click on "Compare results" and then select "Select top 10"&lt;/p&gt;
&lt;p&gt;See those bar charts? I want them to display the percentage on each bar so I can take a better screenshot, modify the page like that&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I'm impressed at how well this worked - Claude injected custom JavaScript into the page to draw additional labels on top of the existing chart.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Claude AI conversation showing browser automation. A thinking step reads &amp;quot;Pivoted strategy to avoid recursion issues with chart labeling &amp;gt;&amp;quot; followed by the message &amp;quot;Good, the chart is back. Now let me carefully add the labels using an inline plugin on the chart instance to avoid the recursion issue.&amp;quot; A collapsed &amp;quot;Browser_evaluate&amp;quot; section shows a browser_evaluate tool call with JavaScript code using Chart.js canvas context to draw percentage labels on bars: meta.data.forEach((bar, index) =&amp;gt; { const value = dataset.data[index]; if (value !== undefined &amp;amp;&amp;amp; value !== null) { ctx.save(); ctx.textAlign = 'center'; ctx.textBaseline = 'bottom'; ctx.fillStyle = '#333'; ctx.font = 'bold 12px sans-serif'; ctx.fillText(value.toFixed(1) + '%', bar.x, bar.y - 5); A pending step reads &amp;quot;Let me take a screenshot to see if it worked.&amp;quot; followed by a completed &amp;quot;Done&amp;quot; step, and the message &amp;quot;Let me take a screenshot to check the result.&amp;quot;" src="https://static.simonwillison.net/static/2026/claude-chrome-draw-on-chart.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: If you look at the transcript Claude claims to have switched to Playwright, which is confusing because I didn't think I had that configured.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/KLieret/status/2024176335782826336"&gt;@KLieret&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/benchmarks"&gt;benchmarks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/browser-agents"&gt;browser-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/minimax"&gt;minimax&lt;/a&gt;&lt;/p&gt;



</summary><category term="benchmarks"/><category term="django"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="coding-agents"/><category term="ai-in-china"/><category term="browser-agents"/><category term="minimax"/></entry><entry><title>Three months of OpenClaw</title><link href="https://simonwillison.net/2026/Feb/15/openclaw/#atom-tag" rel="alternate"/><published>2026-02-15T17:23:28+00:00</published><updated>2026-02-15T17:23:28+00:00</updated><id>https://simonwillison.net/2026/Feb/15/openclaw/#atom-tag</id><summary type="html">
    &lt;p&gt;It's wild that the first commit to OpenClaw was &lt;a href="https://github.com/openclaw/openclaw/commit/f6dd362d39b8e30bd79ef7560aab9575712ccc11"&gt;on November 25th 2025&lt;/a&gt;, and less than three months later it's hit 10,000 commits from 600 contributors, attracted 196,000 GitHub stars and sort-of been featured in an extremely vague &lt;a href="https://www.youtube.com/watch?v=n7I-D4YXbzg"&gt;Super Bowl commercial for AI.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Quoting AI.com founder &lt;a href="https://twitter.com/kris/status/2020663711015514399"&gt;Kris Marszalek&lt;/a&gt;, purchaser of the &lt;a href="https://www.theregister.com/2026/02/09/70m_aicom_domain_sale/"&gt;most expensive domain in history&lt;/a&gt; for $70m:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;ai.com is the world’s first easy-to-use and secure implementation of OpenClaw, the open source agent framework that went viral two weeks ago; we made it easy to use without any technical skills, while hardening security to keep your data safe.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Looks like vaporware to me - all you can do right now is reserve a handle - but it's still remarkable to see an open source project get to &lt;em&gt;that&lt;/em&gt; level of hype in such a short space of time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: OpenClaw creator Peter Steinberger &lt;a href="https://steipete.me/posts/2026/openclaw"&gt;just announced&lt;/a&gt; that he's joining OpenAI and plans to transfer ownership of OpenClaw to a new independent foundation.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/domains"&gt;domains&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/peter-steinberger"&gt;peter-steinberger&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openclaw"&gt;openclaw&lt;/a&gt;&lt;/p&gt;



</summary><category term="domains"/><category term="open-source"/><category term="ai"/><category term="openai"/><category term="ai-agents"/><category term="peter-steinberger"/><category term="openclaw"/></entry></feed>