<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: olmo</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/olmo.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-11-22T23:59:46+00:00</updated><author><name>Simon Willison</name></author><entry><title>Olmo 3 is a fully open LLM</title><link href="https://simonwillison.net/2025/Nov/22/olmo-3/#atom-tag" rel="alternate"/><published>2025-11-22T23:59:46+00:00</published><updated>2025-11-22T23:59:46+00:00</updated><id>https://simonwillison.net/2025/Nov/22/olmo-3/#atom-tag</id><summary type="html">
    &lt;p&gt;Olmo is the LLM series from Ai2 - the &lt;a href="https://allenai.org/"&gt;Allen institute for AI&lt;/a&gt;. Unlike most open weight models these are notable for including the full training data, training process and checkpoints along with those releases.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://allenai.org/blog/olmo3"&gt;new Olmo 3&lt;/a&gt; claims to be "the best fully open 32B-scale thinking model" and has a strong focus on interpretability:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At its center is &lt;strong&gt;Olmo 3-Think (32B)&lt;/strong&gt;, the best fully open 32B-scale thinking model that for the first time lets you inspect intermediate reasoning traces and trace those behaviors back to the data and training decisions that produced them.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They've released four 7B models - Olmo 3-Base, Olmo 3-Instruct, Olmo 3-Think and Olmo 3-RL Zero, plus 32B variants of the 3-Think and 3-Base models.&lt;/p&gt;
&lt;p&gt;Having full access to the training data is really useful. Here's how they describe that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Olmo 3 is pretrained on &lt;strong&gt;Dolma 3&lt;/strong&gt;, a new ~9.3-trillion-token corpus drawn from web pages, science PDFs processed with &lt;a href="https://olmocr.allenai.org/"&gt;olmOCR&lt;/a&gt;, codebases, math problems and solutions, and encyclopedic text. From this pool, we construct &lt;strong&gt;Dolma 3 Mix&lt;/strong&gt;, a 5.9-trillion-token (~6T) pretraining mix with a higher proportion of coding and mathematical data than earlier Dolma releases, plus much stronger decontamination via extensive deduplication, quality filtering, and careful control over data mixing. We follow established web standards in collecting training data and don't collect from sites that explicitly disallow it, including paywalled content.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They also highlight that they are training on fewer tokens than their competition:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] it's the strongest fully open thinking model we're aware of, narrowing the gap to the best open-weight models of similar scale – such as Qwen 3 32B – while training on roughly 6x fewer tokens.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you're continuing to hold out hope for a model trained entirely on licensed data this one sadly won't fit the bill - a lot of that data still comes from a crawl of the web.&lt;/p&gt;
&lt;p&gt;I tried out the 32B Think model and the 7B Instruct model &lt;a href="https://lmstudio.ai/models/olmo3"&gt;using LM Studio&lt;/a&gt;. The 7B model is a 4.16GB download, the 32B one is 18.14GB.&lt;/p&gt;
&lt;p&gt;The 32B model is absolutely an over-thinker! I asked it to "Generate an SVG of a pelican riding a bicycle" and it thought for &lt;em&gt;14 minutes 43 seconds&lt;/em&gt;, outputting 8,437 tokens total most of which was &lt;a href="https://gist.github.com/simonw/2ae9d5ed71de9608b7955eea9671306f"&gt;this epic thinking trace&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I don't usually quote the full SVG in these write-ups, but in this case it's short enough that I think it's worth sharing. The SVG comments give a great impression of what it was trying to do - it has a Bicycle, Bike frame, Pelican, Left and Right wings and even "Feet on pedals".&lt;/p&gt;
&lt;div class="highlight highlight-text-xml-svg"&gt;&lt;pre&gt;&amp;lt;&lt;span class="pl-ent"&gt;svg&lt;/span&gt; &lt;span class="pl-e"&gt;width&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;200&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;height&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;200&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;viewBox&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;0 0 100 100&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&amp;gt;
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;&amp;lt;!--&lt;/span&gt; Bicycle &lt;span class="pl-c"&gt;--&amp;gt;&lt;/span&gt;&lt;/span&gt;
  &amp;lt;&lt;span class="pl-ent"&gt;circle&lt;/span&gt; &lt;span class="pl-e"&gt;cx&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;30&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;cy&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;60&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;r&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;15&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;stroke&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;black&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;none&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt;
  &amp;lt;&lt;span class="pl-ent"&gt;circle&lt;/span&gt; &lt;span class="pl-e"&gt;cx&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;70&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;cy&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;60&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;r&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;15&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;stroke&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;black&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;none&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt;
  
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;&amp;lt;!--&lt;/span&gt; Bike frame &lt;span class="pl-c"&gt;--&amp;gt;&lt;/span&gt;&lt;/span&gt;
  &amp;lt;&lt;span class="pl-ent"&gt;rect&lt;/span&gt; &lt;span class="pl-e"&gt;x&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;35&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;y&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;25&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;width&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;30&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;height&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;10&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;saddlebrown&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt;
  &amp;lt;&lt;span class="pl-ent"&gt;line&lt;/span&gt; &lt;span class="pl-e"&gt;x1&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;35&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;y1&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;40&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;x2&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;30&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;y2&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;60&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;stroke&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;black&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;stroke-width&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;3&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt;
  &amp;lt;&lt;span class="pl-ent"&gt;line&lt;/span&gt; &lt;span class="pl-e"&gt;x1&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;65&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;y1&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;40&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;x2&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;70&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;y2&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;60&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;stroke&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;black&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;stroke-width&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;3&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt;
  
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;&amp;lt;!--&lt;/span&gt; Pelican &lt;span class="pl-c"&gt;--&amp;gt;&lt;/span&gt;&lt;/span&gt;
  &amp;lt;&lt;span class="pl-ent"&gt;ellipse&lt;/span&gt; &lt;span class="pl-e"&gt;cx&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;55&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;cy&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;65&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;rx&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;20&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;ry&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;15&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;white&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt;
  &amp;lt;&lt;span class="pl-ent"&gt;polygon&lt;/span&gt; &lt;span class="pl-e"&gt;points&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;52 50,57 35,62 50&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;black&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt; &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;&amp;lt;!--&lt;/span&gt; Head/beak &lt;span class="pl-c"&gt;--&amp;gt;&lt;/span&gt;&lt;/span&gt;
  &amp;lt;&lt;span class="pl-ent"&gt;circle&lt;/span&gt; &lt;span class="pl-e"&gt;cx&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;55&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;cy&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;45&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;r&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;2&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;white&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt;
  &amp;lt;&lt;span class="pl-ent"&gt;circle&lt;/span&gt; &lt;span class="pl-e"&gt;cx&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;60&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;cy&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;45&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;r&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;2&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;white&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt;
  
  &amp;lt;&lt;span class="pl-ent"&gt;polygon&lt;/span&gt; &lt;span class="pl-e"&gt;points&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;45 60,50 70,55 60&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;lightgrey&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt; &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;&amp;lt;!--&lt;/span&gt; Left wing &lt;span class="pl-c"&gt;--&amp;gt;&lt;/span&gt;&lt;/span&gt;
  &amp;lt;&lt;span class="pl-ent"&gt;polygon&lt;/span&gt; &lt;span class="pl-e"&gt;points&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;65 60,70 70,55 60&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;lightgrey&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt; &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;&amp;lt;!--&lt;/span&gt; Right wing &lt;span class="pl-c"&gt;--&amp;gt;&lt;/span&gt;&lt;/span&gt;
  
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;&amp;lt;!--&lt;/span&gt; Feet on pedals &lt;span class="pl-c"&gt;--&amp;gt;&lt;/span&gt;&lt;/span&gt;
  &amp;lt;&lt;span class="pl-ent"&gt;polygon&lt;/span&gt; &lt;span class="pl-e"&gt;points&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;25 75,30 85,35 75&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;black&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt;
  &amp;lt;&lt;span class="pl-ent"&gt;polygon&lt;/span&gt; &lt;span class="pl-e"&gt;points&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;75 75,70 85,65 75&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-e"&gt;fill&lt;/span&gt;=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;black&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;/&amp;gt;
&amp;lt;/&lt;span class="pl-ent"&gt;svg&lt;/span&gt;&amp;gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Rendered it looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/olmo3-32b-pelican.jpg" alt="Two circles, each with a triangle sticking out from the bottom. They have bars leading up to a brown box. Overlapping them is a black triangle with white circles for eyes and two grey triangles that are probably meant to be wings. It is not recognizable as a pelican or a bicycle." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I tested OLMo 2 32B 4bit &lt;a href="https://simonwillison.net/2025/Mar/16/olmo2/"&gt;back in March&lt;/a&gt; and got something that, while pleasingly abstract, didn't come close to resembling a pelican or a bicycle:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/olmo2-pelican.jpg" alt="Blue and black wiggly lines looking more like a circuit diagram than a pelican riding a bicycle" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;To be fair 32B models generally don't do great with this. Here's Qwen 3 32B's attempt (I ran that just now &lt;a href="https://openrouter.ai/chat?models=qwen/qwen3-32b"&gt;using OpenRouter&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/qwen3-32b-pelican.png" alt="The bicycle is two black circles joined by two lines, with a weird rectangular saddle perched on top The pelican is a blue oval, a white circles with a yellow triangle in it and a weird eye shaped oval overlapping the blue one." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="olmotrace"&gt;OlmoTrace&lt;/h4&gt;
&lt;p&gt;I was particularly keen on trying out the ability to "inspect intermediate reasoning traces". Here's how that's described later in the announcement:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A core goal of Olmo 3 is not just to &lt;em&gt;open&lt;/em&gt; the model flow, but to make it &lt;em&gt;actionable&lt;/em&gt; for people who want to understand and improve model behavior. Olmo 3 integrates with &lt;a href="https://allenai.org/blog/olmotrace"&gt;&lt;strong&gt;OlmoTrace&lt;/strong&gt;&lt;/a&gt;, our tool for tracing model outputs back to training data in real time.&lt;/p&gt;
&lt;p&gt;For example, in the Ai2 Playground, you can ask Olmo 3-Think (32B) to answer a general-knowledge question, then use OlmoTrace to inspect where and how the model may have learned to generate parts of its response. This closes the gap between training data and model behavior: you can see not only what the model is doing, but why---and adjust data or training decisions accordingly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You can access OlmoTrace via &lt;a href="https://playground.allenai.org/"&gt;playground.allenai.org&lt;/a&gt;, by first running a prompt and then clicking the "Show OlmoTrace" button below the output.&lt;/p&gt;
&lt;p&gt;I tried that on "Generate a conference bio for Simon Willison" (an ego-prompt I use to see how much the models have picked up about me from their training data) and got back a result that looked like this:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/olmotrace.jpg" alt="Screenshot of AI2 Playground interface with Olmo 3 32B Think model selected. User prompt reads &amp;quot;Generate a conference bio for Simon Willison&amp;quot;. AI response shows expanded &amp;quot;Thoughts&amp;quot; section: &amp;quot;Okay, the user wants me to generate a conference bio for Simon Willison. Let me start by recalling who Simon is. He's a well-known developer, entrepreneur, and author in the tech community. I remember he's the co-founder of co:here and now works at Anthropic. He's famous for his work with SQLite, especially the datasette project. Also, he writes a blog called SimonWillison.net which is quite popular. First, I need to structure a typical conference bio. These usually include his current role, past positions, notable projects, and maybe some achievements or recognitions. I should highlight his key contributions like datasette, which allows easy exploration of SQLite databases via a web interface. That's a big deal in data journalism and open data circles.&amp;quot; Right panel shows &amp;quot;OlmoTrace&amp;quot; feature described as &amp;quot;Documents from the training data that have exact text matches with the model response. Powered by infini-gram&amp;quot;. First document excerpt discusses technology and innovation, with highlighted match text &amp;quot;societal implications of technology, emphasizing the&amp;quot; shown in bold, surrounded by text about responsibility and merging innovation with intellect. Second document excerpt about Matt Hall has highlighted match &amp;quot;is a software engineer and entrepreneur based in&amp;quot; shown in bold, describing someone in New York City who co-founded a PFP collection and works at Google Creative Lab. Note indicates &amp;quot;Document repeated 2 times in result&amp;quot; with &amp;quot;View all repeated documents&amp;quot; link." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;It thinks I co-founded co:here and work at Anthropic, both of which are incorrect - but that's not uncommon with LLMs, I frequently see them suggest that I'm the CTO of GitHub and other such inaccuracies.&lt;/p&gt;
&lt;p&gt;I found the OlmoTrace panel on the right disappointing. None of the training documents it highlighted looked relevant - it appears to be looking for phrase matches (powered by &lt;a href="https://infini-gram.io/"&gt;Ai2's infini-gram&lt;/a&gt;) but the documents it found had nothing to do with me at all.&lt;/p&gt;
&lt;h4 id="can-open-training-data-address-concerns-of-backdoors-"&gt;Can open training data address concerns of backdoors?&lt;/h4&gt;
&lt;p&gt;Ai2 claim that Olmo 3 is "the best fully open 32B-scale thinking model", which I think holds up provided you define "fully open" as including open training data. There's not a great deal of competition in that space though - Ai2 compare themselves to &lt;a href="https://marin.community/"&gt;Stanford's Marin&lt;/a&gt; and &lt;a href="https://www.swiss-ai.org/apertus"&gt;Swiss AI's Apertus&lt;/a&gt;, neither of which I'd heard about before.&lt;/p&gt;
&lt;p&gt;A big disadvantage of other open weight models is that it's impossible to audit their training data. Anthropic published a paper last month showing that &lt;a href="https://www.anthropic.com/research/small-samples-poison"&gt;a small number of samples can poison LLMs of any size&lt;/a&gt; - it can take just "250 poisoned documents" to add a backdoor to a large model that triggers undesired behavior based on a short carefully crafted prompt.&lt;/p&gt;

&lt;p&gt;This makes fully open training data an even bigger deal.&lt;/p&gt;

&lt;p&gt;Ai2 researcher Nathan Lambert included this note about the importance of transparent training data in &lt;a href="https://www.interconnects.ai/p/olmo-3-americas-truly-open-reasoning"&gt;his detailed post about the release&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;In particular, we're excited about the future of RL Zero research on Olmo 3 precisely because everything is open. Researchers can study the interaction between the reasoning traces we include at midtraining and the downstream model behavior (qualitative and quantitative).&lt;/p&gt;

&lt;p&gt;This helps answer questions that have plagued RLVR results on Qwen models, hinting at forms of data contamination particularly on math and reasoning benchmarks (see Shao, Rulin, et al. "Spurious rewards: Rethinking training signals in rlvr." &lt;a href="https://arxiv.org/abs/2506.10947"&gt;arXiv preprint arXiv:2506.10947&lt;/a&gt; (2025). or Wu, Mingqi, et al. "Reasoning or memorization? unreliable results of reinforcement learning due to data contamination." &lt;a href="https://arxiv.org/abs/2507.10532"&gt;arXiv preprint arXiv:2507.10532&lt;/a&gt; (2025).)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I hope we see more competition in this space, including further models in the Olmo series. The improvements from Olmo 1 (in &lt;a href="https://simonwillison.net/2024/Feb/2/olmos/"&gt;February 2024&lt;/a&gt;) and Olmo 2 (in &lt;a href="https://simonwillison.net/2025/Mar/16/olmo2/"&gt;March 2025&lt;/a&gt;) have been significant. I'm hoping that trend continues!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/interpretability"&gt;interpretability&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai2"&gt;ai2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lm-studio"&gt;lm-studio&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nathan-lambert"&gt;nathan-lambert&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/olmo"&gt;olmo&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="interpretability"/><category term="pelican-riding-a-bicycle"/><category term="llm-reasoning"/><category term="ai2"/><category term="ai-ethics"/><category term="llm-release"/><category term="lm-studio"/><category term="nathan-lambert"/><category term="olmo"/></entry><entry><title>What people get wrong about the leading Chinese open models: Adoption and censorship</title><link href="https://simonwillison.net/2025/May/6/what-people-get-wrong-about-the-leading-chinese-models/#atom-tag" rel="alternate"/><published>2025-05-06T14:06:37+00:00</published><updated>2025-05-06T14:06:37+00:00</updated><id>https://simonwillison.net/2025/May/6/what-people-get-wrong-about-the-leading-chinese-models/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.interconnects.ai/p/what-people-get-wrong-about-the-leading"&gt;What people get wrong about the leading Chinese open models: Adoption and censorship&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
While I've been enjoying &lt;a href="https://simonwillison.net/2025/May/2/qwen3-8b/"&gt;trying out Alibaba's Qwen 3&lt;/a&gt; a lot recently, Nathan Lambert focuses on the elephant in the room:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;People vastly underestimate the number of companies that cannot use Qwen and DeepSeek open models because they come from China. This includes on-premise solutions built by people who know the fact that model weights alone cannot reveal anything to their creators.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The root problem here is the closed nature of the training data. Even if a model is open weights, it's not possible to conclusively determine that it couldn't add backdoors to generated code or trigger "indirect influence of Chinese values on Western business systems". Qwen 3 certainly has baked in opinions about the status of Taiwan!&lt;/p&gt;
&lt;p&gt;Nathan sees this as an opportunity for other liberally licensed models, including his own team's OLMo:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This gap provides a big opportunity for Western AI labs to lead in open models. Without DeepSeek and Qwen, the top tier of models we’re left with are Llama and Gemma, which both have very restrictive licenses when compared to their Chinese counterparts. These licenses are proportionally likely to block an IT department from approving a model.&lt;/p&gt;
&lt;p&gt;This takes us to the middle tier of permissively licensed, open weight models who actually have a huge opportunity ahead of them: OLMo, of course, I’m biased, Microsoft with Phi, Mistral, IBM (!??!), and some other smaller companies to fill out the long tail.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/natolambert/status/1919751157351583858"&gt;@natolambert&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nathan-lambert"&gt;nathan-lambert&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/olmo"&gt;olmo&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="qwen"/><category term="ai-ethics"/><category term="ai-in-china"/><category term="nathan-lambert"/><category term="olmo"/></entry><entry><title>mlx-community/OLMo-2-0325-32B-Instruct-4bit</title><link href="https://simonwillison.net/2025/Mar/16/olmo2/#atom-tag" rel="alternate"/><published>2025-03-16T03:30:41+00:00</published><updated>2025-03-16T03:30:41+00:00</updated><id>https://simonwillison.net/2025/Mar/16/olmo2/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/mlx-community/OLMo-2-0325-32B-Instruct-4bit"&gt;mlx-community/OLMo-2-0325-32B-Instruct-4bit&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OLMo 2 32B &lt;a href="https://simonwillison.net/2025/Mar/13/ai2/"&gt;claims to be&lt;/a&gt; "the first fully-open model (all data, code, weights, and details are freely available) to outperform GPT3.5-Turbo and GPT-4o mini". Thanks to the MLX project here's a recipe that worked for me to run it on my Mac, via my &lt;a href="https://github.com/simonw/llm-mlx"&gt;llm-mlx&lt;/a&gt; plugin.&lt;/p&gt;
&lt;p&gt;To install the model:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install llm-mlx
llm mlx download-model mlx-community/OLMo-2-0325-32B-Instruct-4bit
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That downloads 17GB to &lt;code&gt;~/.cache/huggingface/hub/models--mlx-community--OLMo-2-0325-32B-Instruct-4bit&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To start an interactive chat with OLMo 2:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm chat -m mlx-community/OLMo-2-0325-32B-Instruct-4bit
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or to run a prompt:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m mlx-community/OLMo-2-0325-32B-Instruct-4bit 'Generate an SVG of a pelican riding a bicycle' -o unlimited 1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;-o unlimited 1&lt;/code&gt; removes the cap on the number of output tokens - the default for &lt;code&gt;llm-mlx&lt;/code&gt; is 1024 which isn't enough to attempt to draw a pelican.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://gist.github.com/simonw/53f00731d494439d4aeca6bdd55368ca"&gt;pelican it drew&lt;/a&gt; is refreshingly abstract:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Blue and black wiggly lines looking more like a circuit diagram than a pelican riding a bicycle" src="https://static.simonwillison.net/static/2025/olmo2-pelican.jpg" /&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/awnihannun/status/1900408729268609309"&gt;@awnihannun&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai2"&gt;ai2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/olmo"&gt;olmo&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="llm"/><category term="mlx"/><category term="pelican-riding-a-bicycle"/><category term="ai2"/><category term="olmo"/></entry><entry><title>Quoting Ai2</title><link href="https://simonwillison.net/2025/Mar/13/ai2/#atom-tag" rel="alternate"/><published>2025-03-13T21:10:29+00:00</published><updated>2025-03-13T21:10:29+00:00</updated><id>https://simonwillison.net/2025/Mar/13/ai2/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://allenai.org/blog/olmo2-32B"&gt;&lt;p&gt;Today we release &lt;strong&gt;OLMo 2 32B&lt;/strong&gt;, the most capable and largest model in the OLMo 2 family, scaling up the OLMo 2 training recipe used for our &lt;a href="https://allenai.org/blog/olmo2"&gt;7B and 13B models released in November&lt;/a&gt;. It is trained up to 6T tokens and post-trained using Tulu 3.1. OLMo 2 32B is the &lt;strong&gt;first fully-open model&lt;/strong&gt; (all data, code, weights, and details are freely available) &lt;strong&gt;to outperform GPT3.5-Turbo and GPT-4o mini&lt;/strong&gt; on a suite of popular, multi-skill academic benchmarks.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://allenai.org/blog/olmo2-32B"&gt;Ai2&lt;/a&gt;, OLMo 2 32B release announcement&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai2"&gt;ai2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/olmo"&gt;olmo&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="training-data"/><category term="ai2"/><category term="llm-release"/><category term="olmo"/></entry><entry><title>Open Language Models (OLMos) and the LLM landscape</title><link href="https://simonwillison.net/2024/Feb/2/olmos/#atom-tag" rel="alternate"/><published>2024-02-02T04:11:40+00:00</published><updated>2024-02-02T04:11:40+00:00</updated><id>https://simonwillison.net/2024/Feb/2/olmos/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.interconnects.ai/p/olmo"&gt;Open Language Models (OLMos) and the LLM landscape&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OLMo is a newly released LLM from the Allen Institute for AI (AI2) currently available in 7b and 1b parameters (OLMo-65b is on the way) and trained on a fully openly published dataset called Dolma.&lt;/p&gt;

&lt;p&gt;The model and code are Apache 2, while the data is under the “AI2 ImpACT license”.&lt;/p&gt;

&lt;p&gt;From the benchmark scores shared here by Nathan Lambert it looks like this may be the highest performing model currently available that was built using a fully documented training set.&lt;/p&gt;

&lt;p&gt;What’s in Dolma? It’s mainly Common Crawl, Wikipedia, Project Gutenberg and the Stack.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/natolambert/status/1753081921293484333"&gt;@natolambert&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai2"&gt;ai2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nathan-lambert"&gt;nathan-lambert&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/olmo"&gt;olmo&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="training-data"/><category term="ai2"/><category term="llm-release"/><category term="nathan-lambert"/><category term="olmo"/></entry></feed>