<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: hugging-face</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/hugging-face.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-03-30T14:28:34+00:00</updated><author><name>Simon Willison</name></author><entry><title>Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer</title><link href="https://simonwillison.net/2026/Mar/30/mr-chatterbox/#atom-tag" rel="alternate"/><published>2026-03-30T14:28:34+00:00</published><updated>2026-03-30T14:28:34+00:00</updated><id>https://simonwillison.net/2026/Mar/30/mr-chatterbox/#atom-tag</id><summary type="html">
    &lt;p&gt;Trip Venturella released &lt;a href="https://www.estragon.news/mr-chatterbox-or-the-modern-prometheus/"&gt;Mr. Chatterbox&lt;/a&gt;, a language model trained entirely on out-of-copyright text from the British Library. Here's how he describes it in &lt;a href="https://huggingface.co/tventurella/mr_chatterbox_model"&gt;the model card&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Mr. Chatterbox is a language model trained entirely from scratch on a corpus of over 28,000 Victorian-era British texts published between 1837 and 1899, drawn from a dataset made available &lt;a href="https://huggingface.co/datasets/TheBritishLibrary/blbooks"&gt;by the British Library&lt;/a&gt;. The model has absolutely no training inputs from after 1899 — the vocabulary and ideas are formed exclusively from nineteenth-century literature.&lt;/p&gt;
&lt;p&gt;Mr. Chatterbox's training corpus was 28,035 books, with an estimated 2.93 billion input tokens after filtering. The model has roughly 340 million paramaters, roughly the same size as GPT-2-Medium. The difference is, of course, that unlike GPT-2, Mr. Chatterbox is trained entirely on historical data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Given how hard it is to train a useful LLM without using vast amounts of scraped, unlicensed data I've been dreaming of a model like this for a couple of years now. What would a model trained on out-of-copyright text be like to chat with?&lt;/p&gt;
&lt;p&gt;Thanks to Trip we can now find out for ourselves!&lt;/p&gt;
&lt;p&gt;The model itself is tiny, at least by Large Language Model standards - just &lt;a href="https://huggingface.co/tventurella/mr_chatterbox_model/tree/main"&gt;2.05GB&lt;/a&gt; on disk. You can try it out using Trip's &lt;a href="https://huggingface.co/spaces/tventurella/mr_chatterbox"&gt;HuggingFace Spaces demo&lt;/a&gt;:&lt;/p&gt;
&lt;p style="text-align: center"&gt;&lt;img src="https://static.simonwillison.net/static/2026/chatterbox.jpg" alt="Screenshot of a Victorian-themed chatbot interface titled &amp;quot;🎩 Mr. Chatterbox (Beta)&amp;quot; with subtitle &amp;quot;The Victorian Gentleman Chatbot&amp;quot;. The conversation shows a user asking &amp;quot;How should I behave at dinner?&amp;quot; with the bot replying &amp;quot;My good fellow, one might presume that such trivialities could not engage your attention during an evening's discourse!&amp;quot; The user then asks &amp;quot;What are good topics?&amp;quot; and the bot responds &amp;quot;The most pressing subjects of our society— Indeed, a gentleman must endeavor to engage the conversation with grace and vivacity. Such pursuits serve as vital antidotes against ennui when engaged in agreeable company.&amp;quot; A text input field at the bottom reads &amp;quot;Say hello...&amp;quot; with a send button. The interface uses a dark maroon and cream color scheme." style="max-width: 80%;" /&gt;&lt;/p&gt;
&lt;p&gt;Honestly, it's pretty terrible. Talking with it feels more like chatting with a Markov chain than an LLM - the responses may have a delightfully Victorian flavor to them but it's hard to get a response that usefully answers a question.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://arxiv.org/abs/2203.15556"&gt;2022 Chinchilla paper&lt;/a&gt; suggests a ratio of 20x the parameter count to training tokens. For a 340m model that would suggest around 7 billion tokens, more than twice the British Library corpus used here. The smallest Qwen 3.5 model is 600m parameters and that model family starts to get interesting at 2b - so my hunch is we would need 4x or more the training data to get something that starts to feel like a useful conversational partner.&lt;/p&gt;
&lt;p&gt;But what a fun project!&lt;/p&gt;
&lt;h4 id="running-it-locally-with-llm"&gt;Running it locally with LLM&lt;/h4&gt;
&lt;p&gt;I decided to see if I could run the model on my own machine using my &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; framework.&lt;/p&gt;
&lt;p&gt;I got Claude Code to do most of the work - &lt;a href="https://gisthost.github.io/?7d0f00e152dd80d617b5e501e4ff025b/index.html"&gt;here's the transcript&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Trip trained the model using Andrej Karpathy's &lt;a href="https://github.com/karpathy/nanochat"&gt;nanochat&lt;/a&gt;, so I cloned that project, pulled the model weights and told Claude to build a Python script to run the model. Once we had that working (which ended up needing some extra details from the &lt;a href="https://huggingface.co/spaces/tventurella/mr_chatterbox/tree/main"&gt;Space demo source code&lt;/a&gt;) I had Claude &lt;a href="https://llm.datasette.io/en/stable/plugins/tutorial-model-plugin.html"&gt;read the LLM plugin tutorial&lt;/a&gt; and build the rest of the plugin.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/llm-mrchatterbox"&gt;llm-mrchatterbox&lt;/a&gt; is the result. Install the plugin like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install llm-mrchatterbox
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first time you run a prompt it will fetch the 2.05GB model file from Hugging Face. Try that like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m mrchatterbox "Good day, sir"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or start an ongoing chat session like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm chat -m mrchatterbox
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you don't have LLM installed you can still get a chat session started from scratch using uvx like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uvx --with llm-mrchatterbox llm chat -m mrchatterbox
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When you are finished with the model you can delete the cached file using:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm mrchatterbox delete-model
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is the first time I've had Claude Code build a full LLM model plugin from scratch and it worked really well. I expect I'll be using this method again in the future.&lt;/p&gt;
&lt;p&gt;I continue to hope we can get a useful model from entirely public domain data. The fact that Trip was able to get this far using nanochat and 2.93 billion training tokens is a promising start.&lt;/p&gt;

&lt;p id="update-31st"&gt;&lt;strong&gt;Update 31st March 2026&lt;/strong&gt;: I had missed this when I first published this piece but Trip has his own &lt;a href="https://www.estragon.news/mr-chatterbox-or-the-modern-prometheus/"&gt;detailed writeup of the project&lt;/a&gt; which goes into much more detail about how he trained the model. Here's how the books were filtered for pre-training:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;First, I downloaded the British Library dataset split of all 19th-century books. I filtered those down to books contemporaneous with the reign of Queen Victoria—which, unfortunately, cut out the novels of Jane Austen—and further filtered those down to a set of books with a optical character recognition (OCR) confidence of .65 or above, as listed in the metadata. This left me with 28,035 books, or roughly 2.93 billion tokes for pretraining data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Getting it to behave like a conversational model was a lot harder. Trip started by trying to train on plays by Oscar Wilde and George Bernard Shaw, but found they didn't provide enough pairs. Then he tried extracting dialogue pairs from the books themselves with poor results. The approach that worked was to have Claude Haiku and GPT-4o-mini generate synthetic conversation pairs for the supervised fine tuning, which solved the problem but sadly I think dilutes the "no training inputs from after 1899" claim from the original model card.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="andrej-karpathy"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="ai-assisted-programming"/><category term="hugging-face"/><category term="llm"/><category term="training-data"/><category term="uv"/><category term="ai-ethics"/><category term="claude-code"/></entry><entry><title>ggml.ai joins Hugging Face to ensure the long-term progress of Local AI</title><link href="https://simonwillison.net/2026/Feb/20/ggmlai-joins-hugging-face/#atom-tag" rel="alternate"/><published>2026-02-20T17:12:55+00:00</published><updated>2026-02-20T17:12:55+00:00</updated><id>https://simonwillison.net/2026/Feb/20/ggmlai-joins-hugging-face/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/ggml-org/llama.cpp/discussions/19759"&gt;ggml.ai joins Hugging Face to ensure the long-term progress of Local AI&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I don't normally cover acquisition news like this, but I have some thoughts.&lt;/p&gt;
&lt;p&gt;It's hard to overstate the impact Georgi Gerganov has had on the local model space. Back in March 2023 his release of &lt;a href="https://github.com/ggml-org/llama.cpp"&gt;llama.cpp&lt;/a&gt; made it possible to run a local LLM on consumer hardware. The &lt;a href="https://github.com/ggml-org/llama.cpp/blob/775328064e69db1ebd7e19ccb59d2a7fa6142470/README.md?plain=1#L7"&gt;original README&lt;/a&gt; said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The main goal is to run the model using 4-bit quantization on a MacBook. [...] This was hacked in an evening - I have no idea if it works correctly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I wrote about trying llama.cpp out at the time in &lt;a href="https://simonwillison.net/2023/Mar/11/llama/#llama-cpp"&gt;Large language models are having their Stable Diffusion moment&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I used it to run the 7B LLaMA model on my laptop last night, and then this morning upgraded to the 13B model—the one that Facebook claim is competitive with GPT-3.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Meta's &lt;a href="https://github.com/meta-llama/llama/tree/llama_v1"&gt;original LLaMA release&lt;/a&gt; depended on PyTorch and their &lt;a href="https://github.com/facebookresearch/fairscale"&gt;FairScale&lt;/a&gt; PyTorch extension for running on multiple GPUs, and required CUDA and NVIDIA hardware. Georgi's work opened that up to a much wider range of hardware and kicked off the local model movement that has continued to grow since then.&lt;/p&gt;
&lt;p&gt;Hugging Face are already responsible for the incredibly influential &lt;a href="https://github.com/huggingface/transformers"&gt;Transformers&lt;/a&gt; library used by the majority of LLM releases today. They've proven themselves a good steward for that open source project, which makes me optimistic for the future of llama.cpp and related projects.&lt;/p&gt;
&lt;p&gt;This section from the announcement looks particularly promising:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Going forward, our joint efforts will be geared towards the following objectives:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Towards seamless "single-click" integration with the &lt;a href="https://github.com/huggingface/transformers"&gt;transformers&lt;/a&gt; library. The &lt;code&gt;transformers&lt;/code&gt; framework has established itself as the 'source of truth' for AI model definitions. Improving the compatibility between the transformers and the ggml ecosystems is essential for wider model support and quality control.&lt;/li&gt;
&lt;li&gt;Better packaging and user experience of ggml-based software. As we enter the phase in which local inference becomes a meaningful and competitive alternative to cloud inference, it is crucial to improve and simplify the way in which casual users deploy and access local models. We will work towards making llama.cpp ubiquitous and readily available everywhere, and continue partnering with great downstream projects.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Given the influence of Transformers, this closer integration could lead to model releases that are compatible with the GGML ecosystem out of the box. That would be a big win for the local model ecosystem.&lt;/p&gt;
&lt;p&gt;I'm also excited to see investment in "packaging and user experience of ggml-based software". This has mostly been left to tools like &lt;a href="https://ollama.com"&gt;Ollama&lt;/a&gt; and &lt;a href="https://lmstudio.ai"&gt;LM Studio&lt;/a&gt;. ggml-org released &lt;a href="https://github.com/ggml-org/LlamaBarn"&gt;LlamaBarn&lt;/a&gt; last year - "a macOS menu bar app for running local LLMs" - and I'm hopeful that further investment in this area will result in more high quality open source tools for running local models from the team best placed to deliver them.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/ggerganov/status/2024839991482777976"&gt;@ggerganov&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers"&gt;transformers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama-cpp"&gt;llama-cpp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/georgi-gerganov"&gt;georgi-gerganov&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="transformers"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="hugging-face"/><category term="llama-cpp"/><category term="georgi-gerganov"/></entry><entry><title>Voxtral transcribes at the speed of sound</title><link href="https://simonwillison.net/2026/Feb/4/voxtral-2/#atom-tag" rel="alternate"/><published>2026-02-04T22:42:34+00:00</published><updated>2026-02-04T22:42:34+00:00</updated><id>https://simonwillison.net/2026/Feb/4/voxtral-2/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://mistral.ai/news/voxtral-transcribe-2"&gt;Voxtral transcribes at the speed of sound&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Mistral just released Voxtral Transcribe 2 - a family of two new models, one open weights, for transcribing audio to text. This is the latest in their Whisper-like model family, and a sequel to the original Voxtral which they released &lt;a href="https://simonwillison.net/2025/Jul/16/voxtral/"&gt;in July 2025&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Voxtral Realtime - official name &lt;code&gt;Voxtral-Mini-4B-Realtime-2602&lt;/code&gt; - is the open weights (Apache-2.0) model, available as a &lt;a href="https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602"&gt;8.87GB download from Hugging Face&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can try it out in this &lt;a href="https://huggingface.co/spaces/mistralai/Voxtral-Mini-Realtime"&gt;live demo&lt;/a&gt; - don't be put off by the "No microphone found" message, clicking "Record" should have your browser request permission and then start the demo working. I was very impressed by the demo - I talked quickly and used jargon like Django and WebAssembly and it correctly transcribed my text within moments of me uttering each sound. &lt;/p&gt;
&lt;p&gt;The closed weight model is called &lt;code&gt;voxtral-mini-latest&lt;/code&gt; and can be accessed via the Mistral API, using calls that look something like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;curl -X POST &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;https://api.mistral.ai/v1/audio/transcriptions&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  -H &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Authorization: Bearer &lt;span class="pl-smi"&gt;$MISTRAL_API_KEY&lt;/span&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  -F model=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;voxtral-mini-latest&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  -F file=@&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Pelican talk at the library.m4a&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  -F diarize=true \
  -F context_bias=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Datasette&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  -F timestamp_granularities=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;segment&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It's priced at $0.003/minute, which is $0.18/hour.&lt;/p&gt;
&lt;p&gt;The Mistral API console now has a &lt;a href="https://console.mistral.ai/build/audio/speech-to-text"&gt;speech-to-text playground&lt;/a&gt; for exercising the new model and it is &lt;em&gt;excellent&lt;/em&gt;. You can upload an audio file and promptly get a diarized transcript in a pleasant interface, with options to download the result in text, SRT or JSON format.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a speech-to-text transcription interface for a file named &amp;quot;Pelican talk at the library.m4a&amp;quot;. The toolbar shows &amp;quot;Speech to text&amp;quot; with Code, Transcribe, and Download buttons. The transcript shows timestamped segments from 5:53 to 6:53 with a speaker icon, reading: &amp;quot;5:53 – 6:01 So pelicans love to, they're very good at getting the most they can out of the topography when they're flying. 6:01 – 6:06 And our winds come in from the northwest and they hit those bluffs and they're deflected up. 6:07 – 6:18 And they will sit right, they'll fly north into a wind like five feet off those bluffs, but just five or ten feet off the surface because the winds dissipate. 6:19 – 6:22 And they will surf that bluff all the way north. 6:23 – 6:30 So you'll see a wind from the north at 15 miles an hour, and the pelicans are flying north into that wind and not flapping their wings. 6:31 – 6:33 And it's one of the coolest things. 6:33 – 6:35 You can only find it on San Francisco Coast. 6:36 – 6:39 Where right where the bluffs are steep. 6:41 – 6:43 Pacifica, you can find them there. 6:43 – 6:51 They like their, what we call pier bums, which are typically pelicans that have, are in some sort of trouble. 6:51 – 6:53 They're unable to catch food.&amp;quot; The segment at 6:41–6:43 is highlighted in yellow. An audio waveform is shown at the bottom with a playhead near 6:40. Stats in the lower right show 53.90s, 7946.00s, and #45833." src="https://static.simonwillison.net/static/2025/mistral-transcript-ui.jpg" /&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46886735"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mistral"&gt;mistral&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/speech-to-text"&gt;speech-to-text&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="hugging-face"/><category term="mistral"/><category term="speech-to-text"/></entry><entry><title>Kimi K2.5: Visual Agentic Intelligence</title><link href="https://simonwillison.net/2026/Jan/27/kimi-k25/#atom-tag" rel="alternate"/><published>2026-01-27T15:07:41+00:00</published><updated>2026-01-27T15:07:41+00:00</updated><id>https://simonwillison.net/2026/Jan/27/kimi-k25/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.kimi.com/blog/kimi-k2-5.html"&gt;Kimi K2.5: Visual Agentic Intelligence&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Kimi K2 landed &lt;a href="https://simonwillison.net/2025/Jul/11/kimi-k2/"&gt;in July&lt;/a&gt; as a 1 trillion parameter open weight LLM. It was joined by Kimi K2 Thinking &lt;a href="https://simonwillison.net/2025/Nov/6/kimi-k2-thinking/"&gt;in November&lt;/a&gt; which added reasoning capabilities. Now they've made it multi-modal: the K2 models were text-only, but the new 2.5 can handle image inputs as well:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Kimi K2.5 builds on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens. Built as a native multimodal model, K2.5 delivers state-of-the-art coding and vision capabilities and a self-directed agent swarm paradigm.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The "self-directed agent swarm paradigm" claim there means improved long-sequence tool calling and training on how to break down tasks for multiple agents to work on at once:&lt;/p&gt;
&lt;blockquote id="complex-tasks"&gt;&lt;p&gt;For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls. Compared with a single-agent setup, this reduces execution time by up to 4.5x. The agent swarm is automatically created and orchestrated by Kimi K2.5 without any predefined subagents or workflow.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I used the &lt;a href="https://openrouter.ai/moonshotai/kimi-k2.5"&gt;OpenRouter Chat UI&lt;/a&gt; to have it "Generate an SVG of a pelican riding a bicycle", and it did &lt;a href="https://gist.github.com/simonw/32a85e337fbc6ee935d10d89726c0476"&gt;quite well&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Cartoon illustration of a white pelican with a large orange beak and yellow throat pouch riding a green bicycle with yellow feet on the pedals, set against a light blue sky with soft bokeh circles and a green grassy hill. The bicycle frame is a little questionable. The pelican is quite good. The feet do not quite align with the pedals, which are floating clear of the frame." src="https://static.simonwillison.net/static/2026/kimi-k2.5-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;As a more interesting test, I decided to exercise the claims around multi-agent planning with this prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I want to build a Datasette plugin that offers a UI to upload files to an S3 bucket and stores information about them in a SQLite table. Break this down into ten tasks suitable for execution by parallel coding agents.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/ee2583b2eb5706400a4737f56d57c456"&gt;the full response&lt;/a&gt;. It produced ten realistic tasks and reasoned through the dependencies between them. For comparison here's the same prompt &lt;a href="https://claude.ai/share/df9258e7-97ba-4362-83da-76d31d96196f"&gt;against Claude Opus 4.5&lt;/a&gt; and &lt;a href="https://chatgpt.com/share/6978d48c-3f20-8006-9c77-81161f899104"&gt;against GPT-5.2 Thinking&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://huggingface.co/moonshotai/Kimi-K2.5"&gt;Hugging Face repository&lt;/a&gt; is 595GB. The model uses Kimi's janky "modified MIT" license, which adds the following clause:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Given the model's size, I expect one way to run it locally would be with MLX and a pair of $10,000 512GB RAM M3 Ultra Mac Studios. That setup has &lt;a href="https://twitter.com/awnihannun/status/1943723599971443134"&gt;been demonstrated to work&lt;/a&gt; with previous trillion parameter K2 models.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46775961"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/moonshot"&gt;moonshot&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kimi"&gt;kimi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/janky-licenses"&gt;janky-licenses&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="llms"/><category term="hugging-face"/><category term="vision-llms"/><category term="llm-tool-use"/><category term="ai-agents"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="ai-in-china"/><category term="moonshot"/><category term="parallel-agents"/><category term="kimi"/><category term="janky-licenses"/></entry><entry><title>Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation</title><link href="https://simonwillison.net/2026/Jan/22/qwen3-tts/#atom-tag" rel="alternate"/><published>2026-01-22T17:42:34+00:00</published><updated>2026-01-22T17:42:34+00:00</updated><id>https://simonwillison.net/2026/Jan/22/qwen3-tts/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://qwen.ai/blog?id=qwen3tts-0115"&gt;Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I haven't been paying much attention to the state-of-the-art in speech generation models other than noting that they've got &lt;em&gt;really good&lt;/em&gt;, so I can't speak for how notable this new release from Qwen is.&lt;/p&gt;
&lt;p&gt;From &lt;a href="https://github.com/QwenLM/Qwen3-TTS/blob/main/assets/Qwen3_TTS.pdf"&gt;the accompanying paper&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of- the-art 3-second voice cloning and description-based control, allowing both the creation of entirely novel voices and fine-grained manipulation over the output speech. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS adopts a dual-track LM architecture for real-time synthesis [...]. Extensive experiments indicate state-of-the-art performance across diverse objective and subjective benchmark (e.g., TTS multilingual test set, InstructTTSEval, and our long speech test set). To facilitate community research and development, we release both tokenizers and models under the Apache 2.0 license.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To give an idea of size, &lt;a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-Base"&gt;Qwen/Qwen3-TTS-12Hz-1.7B-Base&lt;/a&gt; is 4.54GB on Hugging Face and &lt;a href="https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base"&gt;Qwen/Qwen3-TTS-12Hz-0.6B-Base&lt;/a&gt; is 2.52GB.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://huggingface.co/spaces/Qwen/Qwen3-TTS"&gt;Hugging Face demo&lt;/a&gt; lets you try out the 0.6B and 1.7B models for free in your browser, including voice cloning:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Qwen3-TTS voice cloning web interface with three tabs at top: &amp;quot;Voice Design&amp;quot;, &amp;quot;Voice Clone (Base)&amp;quot; (selected), and &amp;quot;TTS (CustomVoice)&amp;quot;. The page is titled &amp;quot;Clone Voice from Reference Audio&amp;quot; and has two main sections. Left section: &amp;quot;Reference Audio (Upload a voice sample clone)&amp;quot; showing an audio waveform player at 0:00/0:34 with playback controls, upload and microphone icons, followed by &amp;quot;Reference Text (Transcript of the reference audio)&amp;quot; containing three paragraphs: &amp;quot;Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He currently works full-time building open source tools for data journalism, built around Datasette and SQLite. Prior to becoming an independent open source developer, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010. He is a co-creator of the Django Web Framework, and has been blogging about web development and programming since 2002 at simonwillison.net&amp;quot;. Right section: &amp;quot;Target Text (Text to synthesize with cloned voice)&amp;quot; containing text about Qwen3-TTS speech generation capabilities, with &amp;quot;Language&amp;quot; dropdown set to &amp;quot;Auto&amp;quot; and &amp;quot;Model Size&amp;quot; dropdown set to &amp;quot;1.7B&amp;quot;, and a purple &amp;quot;Clone &amp;amp; Generate&amp;quot; button at bottom." src="https://static.simonwillison.net/static/2026/qwen-voice-clone.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I tried this out by recording myself reading &lt;a href="https://simonwillison.net/about/"&gt;my about page&lt;/a&gt; and then having Qwen3-TTS generate audio of me reading the Qwen3-TTS announcement post. Here's the result:&lt;/p&gt;
&lt;p&gt;&lt;audio controls style="width: 100%"&gt;
  &lt;source src="https://static.simonwillison.net/static/2026/qwen-tts-clone.wav" type="audio/wav"&gt;
  Your browser does not support the audio element.
&lt;/audio&gt;&lt;/p&gt;
&lt;p&gt;It's important that everyone understands that voice cloning is now something that's available to anyone with a GPU and a few GBs of VRAM... or in this case a web browser that can access Hugging Face.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Prince Canuma &lt;a href="https://x.com/Prince_Canuma/status/2014453857019904423"&gt;got this working&lt;/a&gt; with his &lt;a href="https://pypi.org/project/mlx-audio/"&gt;mlx-audio&lt;/a&gt; library. I &lt;a href="https://claude.ai/share/2e01ad60-ca38-4e14-ab60-74eaa45b2fbd"&gt;had Claude&lt;/a&gt; turn that into &lt;a href="https://github.com/simonw/tools/blob/main/python/q3_tts.py"&gt;a CLI tool&lt;/a&gt; which you can run with &lt;code&gt;uv&lt;/code&gt; ike this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv run https://tools.simonwillison.net/python/q3_tts.py \
  'I am a pirate, give me your gold!' \
  -i 'gruff voice' -o pirate.wav
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;-i&lt;/code&gt; option lets you use a prompt to describe the voice it should use. On first run this downloads a 4.5GB model file from Hugging Face.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46719229"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/text-to-speech"&gt;text-to-speech&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prince-canuma"&gt;prince-canuma&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="text-to-speech"/><category term="ai"/><category term="generative-ai"/><category term="hugging-face"/><category term="uv"/><category term="qwen"/><category term="mlx"/><category term="prince-canuma"/><category term="ai-in-china"/></entry><entry><title>Mistral-Small 3.2</title><link href="https://simonwillison.net/2025/Jun/20/mistral-small-32/#atom-tag" rel="alternate"/><published>2025-06-20T19:12:42+00:00</published><updated>2025-06-20T19:12:42+00:00</updated><id>https://simonwillison.net/2025/Jun/20/mistral-small-32/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506"&gt;Mistral-Small 3.2&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Released on Hugging Face a couple of hours ago, so far there aren't any quantizations to run it on a Mac but I'm sure those will emerge pretty quickly.&lt;/p&gt;
&lt;p&gt;This is a minor bump to Mistral Small 3.1, one of my favorite local models. I've been running Small 3.1 &lt;a href="https://ollama.com/library/mistral-small3.1/tags"&gt;via Ollama&lt;/a&gt; where it's a 15GB download - these 24 billion parameter models are a great balance between capabilities and not using up all of the available RAM on my laptop. I expect Ollama will add 3.2 imminently.&lt;/p&gt;
&lt;p&gt;According to Mistral:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Small-3.2 improves in the following categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Instruction following&lt;/strong&gt;: Small-3.2 is better at following precise instructions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Repetition errors&lt;/strong&gt;: Small-3.2 produces less infinite generations or repetitive answers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Function calling&lt;/strong&gt;: Small-3.2's function calling template is more robust (see &lt;a href="https://github.com/mistralai/mistral-common/blob/535b4d0a0fc94674ea17db6cf8dc2079b81cbcfa/src/mistral_common/tokens/tokenizers/instruct.py#L778"&gt;here&lt;/a&gt; and &lt;a href="https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506#function-calling"&gt;examples&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Interestingly they recommend running it with a temperature of 0.15 - many models recommend a default of 0.7. They also provide a &lt;a href="https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506/blob/main/SYSTEM_PROMPT.txt"&gt;suggested system prompt&lt;/a&gt; which includes a note that "Your knowledge base was last updated on 2023-10-01".&lt;/p&gt;
&lt;p&gt;It's not currently available via Mistral's API, or through any of the third-party LLM hosting vendors that I've checked, so I've not been able to run a prompt through the model myself yet.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; I downloaded one of first GGUF quantizations to show up on Hugging Face, &lt;a href="https://huggingface.co/gabriellarson/Mistral-Small-3.2-24B-Instruct-2506-GGUF"&gt;gabriellarson/Mistral-Small-3.2-24B-Instruct-2506-GGUF&lt;/a&gt; by Gabriel Larson. I ran it using Ollama and &lt;a href="https://github.com/taketwo/llm-ollama"&gt;llm-ollama&lt;/a&gt; like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ollama pull hf.co/gabriellarson/Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q4_K_M
llm install llm-ollama
llm -m hf.co/gabriellarson/Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q4_K_M \
  "Generate an SVG of a pelican riding a bicycle"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This one is pretty good for a 15GB model!&lt;/p&gt;
&lt;p&gt;&lt;img alt="See description below" src="https://static.simonwillison.net/static/2025/mistral-3.2-pelican.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/87ade7285a1d51d7073f73559baf41b2"&gt;the full transcript&lt;/a&gt; - it doesn't quite work in the actual image but I was delighted to see the model attempt to add this detail:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;!-- Basket with fish --&amp;gt;
&amp;lt;rect x="250" y="190" width="25" height="15" rx="5" fill="#FFA500"/&amp;gt;
&amp;lt;circle cx="260" cy="200" r="3" fill="#FF4500"/&amp;gt; &amp;lt;!-- Fish --&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's what you get if you isolate just that part of the generated SVG:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Large blue rectangle image, in the middle is a very small rounded corner orange rectangle &amp;quot;basket&amp;quot; containing an even smaller red circle that represents the fish." src="https://static.simonwillison.net/static/2025/fish-basket.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I had Mistral Small 3.2 describe the full image to me, since it's a vision-LLM:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m hf.co/gabriellarson/Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q4_K_M \
  'describe image' \
  -a https://static.simonwillison.net/static/2025/mistral-3.2-pelican.jpg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And it gave me the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The image depicts a cartoonish illustration of a duck that is being lifted off the ground by a hook. The duck appears to be in mid-air, with its wings spread slightly as if it's surprised or reacting to being picked up. The hook is attached to an arm or a mechanism and seems to be connected to a vehicle below—perhaps a truck or a platform with wheels. The background of the image is light blue, indicating an outdoor setting. Overall, the scene is whimsical and playful, possibly suggesting a humorous or unusual situation where the duck is being transported in this manner.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Update 2&lt;/strong&gt;: It's now available as &lt;a href="https://ollama.com/library/mistral-small3.2"&gt;an official Ollama model&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ollama pull mistral-small3.2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;LM Studio has a community quantization too: &lt;a href="https://huggingface.co/lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-GGUF"&gt;lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-GGUF&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mistral"&gt;mistral&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ollama"&gt;ollama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lm-studio"&gt;lm-studio&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="hugging-face"/><category term="mistral"/><category term="vision-llms"/><category term="llm-tool-use"/><category term="ollama"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="lm-studio"/></entry><entry><title>Vision Language Models (Better, Faster, Stronger)</title><link href="https://simonwillison.net/2025/May/13/vision-language-models/#atom-tag" rel="alternate"/><published>2025-05-13T15:25:09+00:00</published><updated>2025-05-13T15:25:09+00:00</updated><id>https://simonwillison.net/2025/May/13/vision-language-models/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/blog/vlms-2025"&gt;Vision Language Models (Better, Faster, Stronger)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Extremely useful review of the last year in vision and multi-modal LLMs.&lt;/p&gt;
&lt;p&gt;So much has happened! I'm particularly excited about the range of small open weight vision models that are now available. Models like gemma3-4b-it and Qwen2.5-VL-3B-Instruct produce very impressive results and run happily on mid-range consumer hardware.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/andimarafioti/status/1922230588435579090"&gt;@andimarafioti&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="hugging-face"/><category term="vision-llms"/></entry><entry><title>deepseek-ai/DeepSeek-V3-0324</title><link href="https://simonwillison.net/2025/Mar/24/deepseek/#atom-tag" rel="alternate"/><published>2025-03-24T15:04:04+00:00</published><updated>2025-03-24T15:04:04+00:00</updated><id>https://simonwillison.net/2025/Mar/24/deepseek/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V3-0324"&gt;deepseek-ai/DeepSeek-V3-0324&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Chinese AI lab DeepSeek just released the latest version of their enormous DeepSeek v3 model, baking the release date into the name &lt;code&gt;DeepSeek-V3-0324&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The license is MIT (that's new - previous DeepSeek v3 had a custom license), the README is empty and the release adds up a to a total of 641 GB of files, mostly of the form &lt;code&gt;model-00035-of-000163.safetensors&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The model only came out a few hours ago and MLX developer Awni Hannun already &lt;a href="https://twitter.com/awnihannun/status/1904177084609827054"&gt;has it running&lt;/a&gt; at &amp;gt;20 tokens/second on a 512GB M3 Ultra Mac Studio ($9,499 of ostensibly consumer-grade hardware) via &lt;a href="https://pypi.org/project/mlx-lm/"&gt;mlx-lm&lt;/a&gt; and this &lt;a href="https://huggingface.co/mlx-community/DeepSeek-V3-0324-4bit"&gt;mlx-community/DeepSeek-V3-0324-4bit&lt;/a&gt; 4bit quantization, which reduces the on-disk size to 352 GB.&lt;/p&gt;
&lt;p&gt;I think that means if you have that machine you can run it with my &lt;a href="https://github.com/simonw/llm-mlx"&gt;llm-mlx&lt;/a&gt; plugin like this, but I've not tried myself!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm mlx download-model mlx-community/DeepSeek-V3-0324-4bit
llm chat -m mlx-community/DeepSeek-V3-0324-4bit
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The new model is also &lt;a href="https://openrouter.ai/deepseek/deepseek-chat-v3-0324:free"&gt;listed on OpenRouter&lt;/a&gt;. You can try a chat at &lt;a href="https://openrouter.ai/chat?models=deepseek/deepseek-chat-v3-0324:free"&gt;openrouter.ai/chat?models=deepseek/deepseek-chat-v3-0324:free&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's what the chat interface &lt;a href="https://gist.github.com/simonw/3ce2bf5836743dfaf07d994578b261ba"&gt;gave me&lt;/a&gt; for "Generate an SVG of a pelican riding a bicycle":&lt;/p&gt;
&lt;p&gt;&lt;img alt="There's a pelican, and a bicycle, but both of them look disassembled." src="https://static.simonwillison.net/static/2025/deepseek-v3-pelican.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I have two API keys with OpenRouter - one of them worked with the model, the other gave me a &lt;code&gt;No endpoints found matching your data policy&lt;/code&gt; error - I think because I had a setting on that key disallowing models from training on my activity. The key that worked was a free key with no attached billing credentials.&lt;/p&gt;
&lt;p&gt;For my working API key the &lt;a href="https://github.com/simonw/llm-openrouter"&gt;llm-openrouter&lt;/a&gt; plugin let me run a prompt like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install llm-openrouter
llm keys set openrouter
# Paste key here
llm -m openrouter/deepseek/deepseek-chat-v3-0324:free "best fact about a pelican"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's that "best fact" - the terminal output included Markdown and an emoji combo, here that's rendered.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;One of the most fascinating facts about pelicans is their &lt;strong&gt;unique throat pouch&lt;/strong&gt;, called a &lt;strong&gt;gular sac&lt;/strong&gt;, which can hold up to &lt;strong&gt;3 gallons (11 liters) of water&lt;/strong&gt;—three times more than their stomach!  &lt;/p&gt;
&lt;p&gt;Here’s why it’s amazing:&lt;br /&gt;
- &lt;strong&gt;Fishing Tool&lt;/strong&gt;: They use it like a net to scoop up fish, then drain the water before swallowing.&lt;br /&gt;
- &lt;strong&gt;Cooling Mechanism&lt;/strong&gt;: On hot days, pelicans flutter the pouch to stay cool by evaporating water.&lt;br /&gt;
- &lt;strong&gt;Built-in "Shopping Cart"&lt;/strong&gt;: Some species even use it to carry food back to their chicks.  &lt;/p&gt;
&lt;p&gt;Bonus fact: Pelicans often &lt;strong&gt;fish cooperatively&lt;/strong&gt;, herding fish into shallow water for an easy catch.  &lt;/p&gt;
&lt;p&gt;Would you like more cool pelican facts? 🐦🌊&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In putting this post together I &lt;a href="https://claude.ai/share/fc65bf9b-ae2d-4b23-bd09-ed0d54ff4b56"&gt;got Claude&lt;/a&gt; to build me &lt;a href="https://tools.simonwillison.net/huggingface-storage"&gt;this new tool&lt;/a&gt; for finding the total on-disk size of a Hugging Face repository, which is available in their API but not currently displayed on their website.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Here's a notable independent benchmark &lt;a href="https://twitter.com/paulgauthier/status/1904304052500148423"&gt;from Paul Gauthier&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;DeepSeek's new V3 scored 55% on aider's &lt;a href="https://aider.chat/docs/leaderboards/"&gt;polyglot benchmark&lt;/a&gt;, significantly improving over the prior version. It's the #2 non-thinking/reasoning model, behind only Sonnet 3.7. V3 is competitive with thinking models like R1 &amp;amp; o3-mini.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/deepseek"&gt;deepseek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openrouter"&gt;openrouter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="tools"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="ai-assisted-programming"/><category term="hugging-face"/><category term="llm"/><category term="mlx"/><category term="pelican-riding-a-bicycle"/><category term="deepseek"/><category term="llm-release"/><category term="openrouter"/><category term="ai-in-china"/></entry><entry><title>deepseek-ai/DeepSeek-V3-Base</title><link href="https://simonwillison.net/2024/Dec/25/deepseek-v3/#atom-tag" rel="alternate"/><published>2024-12-25T19:00:33+00:00</published><updated>2024-12-25T19:00:33+00:00</updated><id>https://simonwillison.net/2024/Dec/25/deepseek-v3/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V3-Base"&gt;deepseek-ai/DeepSeek-V3-Base&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
No model card or announcement yet, but this new model release from Chinese AI lab DeepSeek (an arm of Chinese hedge fund &lt;a href="https://en.wikipedia.org/wiki/High-Flyer_(company)"&gt;High-Flyer&lt;/a&gt;) looks very significant.&lt;/p&gt;
&lt;p&gt;It's a huge model - 685B parameters, 687.9 GB on disk (&lt;a href="https://til.simonwillison.net/git/size-of-lfs-files"&gt;TIL how to size a git-lfs repo&lt;/a&gt;). The architecture is &lt;a href="https://twitter.com/dysondunbar/status/1871955700949430299"&gt;a Mixture of Experts&lt;/a&gt; with 256 experts, using 8 per token.&lt;/p&gt;
&lt;p&gt;For comparison, Meta AI's largest released model is their &lt;a href="https://ai.meta.com/blog/meta-llama-3-1/"&gt;Llama 3.1 model&lt;/a&gt; with 405B parameters.&lt;/p&gt;
&lt;p&gt;The new model is apparently available to some people via both &lt;a href="https://chat.deepseek.com/"&gt;chat.deepseek.com&lt;/a&gt; and the DeepSeek API as part of a staged rollout.&lt;/p&gt;
&lt;p&gt;Paul Gauthier got API access and &lt;a href="https://twitter.com/paulgauthier/status/1871919612000092632"&gt;used it&lt;/a&gt; to update his new &lt;a href="https://aider.chat/docs/leaderboards/"&gt;Aider Polyglot leaderboard&lt;/a&gt; - DeepSeek v3 preview scored 48.4%, putting it in second place behind &lt;code&gt;o1-2024-12-17 (high)&lt;/code&gt; and in front of both &lt;code&gt;claude-3-5-sonnet-20241022&lt;/code&gt; and &lt;code&gt;gemini-exp-1206&lt;/code&gt;!&lt;/p&gt;
&lt;p&gt;&lt;img alt="Aider leaderboard chart showing DeepSeek Chat V3 preview in second place" src="https://static.simonwillison.net/static/2024/deepseek-v3.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I never know if I can believe models or not (the first time I asked "what model are you?" it claimed to be "based on OpenAI's GPT-4 architecture"), but I just got this result using &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; and the &lt;a href="https://pypi.org/project/llm-deepseek/"&gt;llm-deepseek&lt;/a&gt; plugin:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m deepseek-chat 'what deepseek model are you?'
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;I'm DeepSeek-V3 created exclusively by DeepSeek. I'm an AI assistant, and I'm at your service! Feel free to ask me anything you'd like. I'll do my best to assist you.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's my &lt;a href="https://gist.github.com/simonw/e7528dc52828fb31415f6e14e3527b93"&gt;initial experiment log&lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/ivanfioravanti/status/1871945175616135298"&gt;@ivanfioravanti&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/aider"&gt;aider&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/deepseek"&gt;deepseek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/paul-gauthier"&gt;paul-gauthier&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="hugging-face"/><category term="aider"/><category term="deepseek"/><category term="paul-gauthier"/><category term="llm-release"/><category term="ai-in-china"/></entry><entry><title>Trying out QvQ - Qwen's new visual reasoning model</title><link href="https://simonwillison.net/2024/Dec/24/qvq/#atom-tag" rel="alternate"/><published>2024-12-24T20:49:08+00:00</published><updated>2024-12-24T20:49:08+00:00</updated><id>https://simonwillison.net/2024/Dec/24/qvq/#atom-tag</id><summary type="html">
    &lt;p&gt;I thought we were done for major model releases in 2024, but apparently not: Alibaba's Qwen team just dropped the &lt;s&gt;Apache 2.0 licensed&lt;/s&gt; Qwen licensed (&lt;a href="https://simonwillison.net/2024/Dec/24/qvq/#the-license-changed"&gt;the license changed&lt;/a&gt;)  QvQ-72B-Preview, "an experimental research model focusing on enhancing visual reasoning capabilities".&lt;/p&gt;
&lt;p&gt;Their blog post is titled &lt;a href="https://qwenlm.github.io/blog/qvq-72b-preview/"&gt;QvQ: To See the World with Wisdom&lt;/a&gt; - similar flowery language to their QwQ announcement &lt;a href="https://qwenlm.github.io/blog/qwq-32b-preview/"&gt;QwQ: Reflect Deeply on the Boundaries of the Unknown&lt;/a&gt; a few weeks ago in November.&lt;/p&gt;
&lt;p&gt;It's a vision-focused follow-up to QwQ, which I &lt;a href="https://simonwillison.net/2024/Nov/27/qwq/"&gt;wrote about previousy&lt;/a&gt;. QwQ is an impressive openly licensed inference-scaling model: give it a prompt and it will think out loud over many tokens while trying to derive a good answer, similar to OpenAI's o1 and o3 models.&lt;/p&gt;
&lt;p&gt;The new QvQ adds vision to the mix. You can try it out &lt;a href="https://huggingface.co/spaces/Qwen/QVQ-72B-preview"&gt;on Hugging Face Spaces&lt;/a&gt; - it accepts an image and a single prompt and then streams out a &lt;em&gt;very&lt;/em&gt; long response where it thinks through the problem you have posed it. There's no option to send a follow-up prompt.&lt;/p&gt;
&lt;p&gt;I've tried it out with a bunch of things, with mixed results - but it's really fun seeing how it works through a problem.&lt;/p&gt;
&lt;p&gt;My most successful prompt was "Count the pelicans" with this image:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/count-pelicans-easy.jpg" alt="A photograph of four glorious young California Brown Pelicans perched on a dock. The back end of another pelican is visible at the front of the photo." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc#response"&gt;the full response&lt;/a&gt;. Some highlights:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;So I've got this picture of some pelicans, and I need to count them. Let me see... First off, I know that pelicans are those big birds with the massive beaks, right? [...]&lt;/p&gt;
&lt;p&gt;Let me try counting again, more carefully this time. Starting from the left:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The first pelican is fully visible, standing at the front.&lt;/li&gt;
&lt;li&gt;Behind it, slightly to the right, is the second pelican.&lt;/li&gt;
&lt;li&gt;Then, further back, there's a third pelican.&lt;/li&gt;
&lt;li&gt;And next to that one, there's a fourth pelican.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So, that's four pelicans in total. But, I notice that on the far right of the image, there's part of another bird, maybe a pelican's back or tail feathers, but it's not fully in the frame, so I don't think I should count it as a whole pelican.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Something about the tone of voice it uses, "First off, I know that pelicans are those big birds with the massive beaks, right?", tickles me.&lt;/p&gt;
&lt;p&gt;I also tried this on a &lt;a href="https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc#image-2"&gt;much harder image&lt;/a&gt; - it counted 17 which looks about right to me but honestly I have trouble with that image myself.&lt;/p&gt;
&lt;p&gt;I fed in &lt;a href="https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc#arc-agi-puzzle-0d87d2a6"&gt;one of the ARC-AGI puzzles&lt;/a&gt; that o3 had failed at.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A puzzle with colored squares, where drawing a line between the single blue squares and turning any intersected rectangles blue is clearly the solution." src="https://static.simonwillison.net/static/2024/arc-agi-task-0d87d2a6.png" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;It produced &lt;a href="https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc#response-1"&gt;a very lengthy chain of thought&lt;/a&gt; that was almost entirely incorrect, but had some interesting "ideas" in it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] Let me try to think in terms of cellular automata rules, like Conway's Game of Life.&lt;/p&gt;
&lt;p&gt;In Game of Life, each cell's state in the next generation is determined by its current state and the states of its eight neighbors.&lt;/p&gt;
&lt;p&gt;Perhaps a similar rule applies here.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Finally, I asked it to "Estimate the height of the dinosaur" against this image (which, as it correctly noted, is actually an inflatable dragon):&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/inflatable-dragon.jpg" alt="A large inflatable Christmas decoration of a reindeer riding a dragon - both reindeer and dragon are wearing Christmas hats" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I'm not sure this task is even possible given the lack of items in the frame of comparable known height. It &lt;a href="https://gist.github.com/simonw/6c296f4b9323736dc77978447b6368fc#response-3"&gt;had a good go at it&lt;/a&gt; though! I particularly enjoyed how it began:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;So I've got this inflatable Christmas decoration, and it's a dragon! But not just any dragon—it's a green dragon wearing a Santa hat and a red and white scarf. And get this, there's a reindeer on its back, also wearing a Santa hat and holding onto the dragon's wings. It's pretty festive and colorful, with lots of vibrant greens, reds, and yellows.&lt;/p&gt;
&lt;p&gt;The dragon has these big, friendly eyes and a smiling face, which makes it look really cheerful. Its wings are spread out, and it has white claws on its feet. The reindeer is brown with antlers and seems to be enjoying the ride on the dragon's back.&lt;/p&gt;
&lt;p&gt;Now, the question is, how tall is this inflatable dragon? I need to estimate its height. Let me see. I'm standing next to it, and I'm about 5 feet 10 inches tall. The dragon is definitely taller than me. Maybe around 8 or 9 feet high? But I should try to be more precise.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I wonder how it decided that its own height was 5 feet 10 inches!&lt;/p&gt;
&lt;h4 id="running-qvq-locally"&gt;Running QvQ locally&lt;/h4&gt;
&lt;p&gt;All of my experiments so far have used the &lt;a href="https://huggingface.co/spaces/Qwen/QVQ-72B-preview"&gt;hosted Hugging Face demo&lt;/a&gt;. I'm hoping to try it out on my own laptop soon - I think it should just fit in 64GB of M2, maybe with a smaller quantization.&lt;/p&gt;
&lt;p&gt;Right now the options are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/Qwen/QVQ-72B-Preview"&gt;Qwen/QVQ-72B-Preview&lt;/a&gt; on Hugging Face has the GPU model weights, for use with Hugging Face Transformers and the &lt;a href="https://pypi.org/project/qwen-vl-utils/"&gt;qwen-vl-utils&lt;/a&gt; Python package.&lt;/li&gt;
&lt;li&gt;Prince Canuma &lt;a href="https://twitter.com/Prince_Canuma/status/1871687356815716672"&gt;converted the model for Apple's MLX framework&lt;/a&gt; as &lt;a href="https://huggingface.co/collections/mlx-community/qvq-72b-preview-676b345a6f93172ba980c0d5"&gt;mlx-community/qvq-72b-preview&lt;/a&gt; and got it working with his excellent &lt;a href="https://github.com/Blaizzy/mlx-vlm"&gt;mlx-vlm&lt;/a&gt; package.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a happy user of Ollama's &lt;a href="https://ollama.com/library/qwq"&gt;qwq port&lt;/a&gt; I'm hoping they add a QvQ release at some point soon as well.&lt;/p&gt;
&lt;h4 id="with-mlx-vlm"&gt;Running QvQ on my Mac using mlx-vlm&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I successfully ran QvQ on my 64GB M2 MacBook Pro using Prince Canuma's &lt;code&gt;mlx-vlm&lt;/code&gt; via &lt;code&gt;uv run &lt;/code&gt; with the following command:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uv run --with &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;numpy&amp;lt;2.0&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; --with mlx-vlm python \
  -m mlx_vlm.generate \
    --model mlx-community/QVQ-72B-Preview-4bit \
    --max-tokens 10000 \
    --temp 0.0 \
    --prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;describe this&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
    --image pelicans-on-bicycles-veo2.jpg&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;My image input here was taken from my experiment with Google's &lt;a href="https://simonwillison.net/2024/Dec/16/veo-2/"&gt;Veo 2&lt;/a&gt; video generation model:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/pelicans-on-bicycles-veo2.jpg" alt="A screenshot of the Google VideoFX video generation tool" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Here's the full output from running that command, after it first downloaded 38GB to &lt;code&gt;~/.cache/huggingface/hub/models--mlx-community--QVQ-72B-Preview-4bi&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;==========
Image: ['pelicans-on-bicycles-veo2.jpg'] 

Prompt: &amp;lt;|im_start|&amp;gt;system
You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.&amp;lt;|im_end|&amp;gt;
&amp;lt;|im_start|&amp;gt;user
describe this&amp;lt;|vision_start|&amp;gt;&amp;lt;|image_pad|&amp;gt;&amp;lt;|vision_end|&amp;gt;&amp;lt;|im_end|&amp;gt;
&amp;lt;|im_start|&amp;gt;assistant

Alright, I've got this "Text to Video" tool to explore. It seems pretty advanced, allowing me to turn text descriptions into actual videos. The interface looks user-friendly, with a dark theme that's easy on the eyes. On the left side, there's a panel where I can input my text prompt. It already has an example filled in: "A pelican riding a bicycle along a coastal path overlooking a harbor."

That sounds interesting! I can see four different video thumbnails on the right side, each showing variations of this scene. One of them even has the pelican wearing a helmet, which adds a humorous touch. Another shows the pelican standing next to a bicycle, almost as if it's ready to hop on and start pedaling.

Below the text input, there are some style options like "photorealistic," "35mm film," "minimal," "sketchy," "handmade," and "abstract." These probably let me choose the visual style of the video. For now, I'll stick with the default settings.

At the bottom of the left panel, there are two buttons: "Start over" and "Create videos." I guess "Create videos" is the one I need to press to generate the videos based on my text prompt.

Let me think about this. I want to see what kind of videos this tool can produce based on my description. The example prompt is already filled in, so maybe I should just go ahead and click "Create videos" to see what happens.

Wait a minute, the prompt mentions a coastal path overlooking a harbor. In the thumbnails, I can see a background with water and boats, which matches the description. It's cool how the tool has interpreted the text into visual elements.

I should also pay attention to the "Visual Prompt" section, which says "View only." That probably means I can't edit it directly, but I can view the generated videos based on this prompt.

Hmm, maybe I should try changing the prompt to see different results. For example, instead of a pelican riding a bicycle, maybe I could describe a different scene. But for now, I'll stick with the given example to understand how it works.

After clicking "Create videos," I expect to see the videos being generated. There might be a loading process, depending on the complexity of the prompt and the tool's processing speed.

Once the videos are ready, I can preview them to see how well they match my description. If I'm not satisfied, I can always go back and adjust the prompt or try different style options.

Overall, this seems like a powerful tool for creating custom videos based on textual descriptions. It could be really useful for storytelling, presentations, or even just for fun, experimenting with different scenes and styles.

I think I'll spend some time exploring the different style options to see how they affect the final videos. For instance, selecting "35mm film" might give the videos a vintage, cinematic look, while "abstract" could produce more artistic and less realistic visuals.

Also, I wonder if there's a way to customize more specific aspects, like the camera angle, lighting, or movement. Maybe after generating the initial videos, there are additional editing options available.

In summary, I'm excited to use this Text to Video tool and see what kind of creative possibilities it opens up. With a bit of practice, I could probably create some impressive visual content just by describing what I want to see.

**Final Answer**

To proceed with generating videos based on the provided text prompt, I need to click on the "Create videos" button. This action will initiate the process of converting the textual description into visual content, resulting in multiple video thumbnails that can be previewed for further use or adjustments.
==========
Prompt: 0.870 tokens-per-sec
Generation: 7.694 tokens-per-sec
&lt;/code&gt;&lt;/pre&gt;

&lt;h4 id="the-license-changed"&gt;The license changed from Apache 2.0 to Qwen&lt;/h4&gt;
&lt;p&gt;When I wrote this post yesterday the LICENSE file in the Hugging Face repository was Apache 2.0. Just after midnight UTC on 25th December &lt;a href="https://huggingface.co/Qwen/QVQ-72B-Preview/commit/53b19b90d67220c896e868a809ef1b93d0c8dab8"&gt;this commit&lt;/a&gt; landed updating the  &lt;code&gt;QVQ-72B-Preview&lt;/code&gt; license file to the &lt;a href="https://huggingface.co/Qwen/QVQ-72B-Preview/blob/main/LICENSE"&gt;Qwen license&lt;/a&gt; instead.&lt;/p&gt;
&lt;p&gt;This looks to me like they were correcting a mistake, not changing their policy. The &lt;a href="https://huggingface.co/Qwen/QVQ-72B-Preview/blob/main/README.md"&gt;README.md&lt;/a&gt; for that repository has this block of YAML:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;license: other
license_name: qwen
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And &lt;a href="https://huggingface.co/Qwen/QVQ-72B-Preview/commits/main/README.md"&gt;commits to that README&lt;/a&gt; at one point linked to the &lt;a href="https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE"&gt;Qwen2.5-72B-Instruct copy of the Qwen license&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The QwQ model repository &lt;a href="https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/LICENSE"&gt;continues to list Apache 2.0&lt;/a&gt;, which matches the YAML in &lt;a href="https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/README.md"&gt;its README&lt;/a&gt; as well.&lt;/p&gt;
&lt;p&gt;So it looks to me like the intention is for QvQ and Qwen2.5-72B-Instruct to be Qwen licensed, while QwQ is Apache 2.0.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prince-canuma"&gt;prince-canuma&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="python"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="hugging-face"/><category term="vision-llms"/><category term="uv"/><category term="qwen"/><category term="mlx"/><category term="llm-reasoning"/><category term="llm-release"/><category term="prince-canuma"/><category term="ai-in-china"/></entry><entry><title>Finally, a replacement for BERT: Introducing ModernBERT</title><link href="https://simonwillison.net/2024/Dec/24/modernbert/#atom-tag" rel="alternate"/><published>2024-12-24T06:21:29+00:00</published><updated>2024-12-24T06:21:29+00:00</updated><id>https://simonwillison.net/2024/Dec/24/modernbert/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.answer.ai/posts/2024-12-19-modernbert.html"&gt;Finally, a replacement for BERT: Introducing ModernBERT&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;a href="https://en.wikipedia.org/wiki/BERT_(language_model)"&gt;BERT&lt;/a&gt; was an early language model released by Google in October 2018. Unlike modern LLMs it wasn't designed for generating text. BERT was trained for masked token prediction and was generally applied to problems like Named Entity Recognition or Sentiment Analysis. BERT also wasn't very useful on its own - most applications required you to fine-tune a model on top of it.&lt;/p&gt;
&lt;p&gt;In exploring BERT I decided to try out &lt;a href="https://huggingface.co/dslim/distilbert-NER"&gt;dslim/distilbert-NER&lt;/a&gt;, a popular Named Entity Recognition model fine-tuned on top of DistilBERT (a smaller distilled version of the original BERT model). &lt;a href="https://til.simonwillison.net/llms/bert-ner"&gt;Here are my notes&lt;/a&gt; on running that using &lt;code&gt;uv run&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Jeremy Howard's &lt;a href="https://www.answer.ai/"&gt;Answer.AI&lt;/a&gt; research group, &lt;a href="https://www.lighton.ai/"&gt;LightOn&lt;/a&gt; and friends supported the development of ModernBERT, a brand new BERT-style model that applies many enhancements from the past six years of advances in this space.&lt;/p&gt;
&lt;p&gt;While BERT was trained on 3.3 billion tokens, producing 110 million and 340 million parameter models, ModernBERT trained on 2 trillion tokens, resulting in 140 million and 395 million parameter models. The parameter count hasn't increased much because it's designed to run on lower-end hardware. It has a 8192 token context length, a significant improvement on BERT's 512.&lt;/p&gt;
&lt;p&gt;I was able to run one of the demos from the announcement post using &lt;code&gt;uv run&lt;/code&gt; like this (I'm not sure why I had to use &lt;code&gt;numpy&amp;lt;2.0&lt;/code&gt; but without that I got an error about &lt;code&gt;cannot import name 'ComplexWarning' from 'numpy.core.numeric'&lt;/code&gt;):&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uv run --with &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;numpy&amp;lt;2.0&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; --with torch --with &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;git+https://github.com/huggingface/transformers.git&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; python&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then this Python:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;torch&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;transformers&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;pipeline&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;pprint&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;pprint&lt;/span&gt;
&lt;span class="pl-s1"&gt;pipe&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;pipeline&lt;/span&gt;(
    &lt;span class="pl-s"&gt;"fill-mask"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;model&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"answerdotai/ModernBERT-base"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;torch_dtype&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;torch&lt;/span&gt;.&lt;span class="pl-c1"&gt;bfloat16&lt;/span&gt;,
)
&lt;span class="pl-s1"&gt;input_text&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"He walked to the [MASK]."&lt;/span&gt;
&lt;span class="pl-s1"&gt;results&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;pipe&lt;/span&gt;(&lt;span class="pl-s1"&gt;input_text&lt;/span&gt;)
&lt;span class="pl-en"&gt;pprint&lt;/span&gt;(&lt;span class="pl-s1"&gt;results&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;Which downloaded 573MB to &lt;code&gt;~/.cache/huggingface/hub/models--answerdotai--ModernBERT-base&lt;/code&gt; and output:&lt;/p&gt;
&lt;pre&gt;[{&lt;span class="pl-s"&gt;'score'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.11669921875&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'sequence'&lt;/span&gt;: &lt;span class="pl-s"&gt;'He walked to the door.'&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token'&lt;/span&gt;: &lt;span class="pl-c1"&gt;3369&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token_str'&lt;/span&gt;: &lt;span class="pl-s"&gt;' door'&lt;/span&gt;},
 {&lt;span class="pl-s"&gt;'score'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.037841796875&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'sequence'&lt;/span&gt;: &lt;span class="pl-s"&gt;'He walked to the office.'&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token'&lt;/span&gt;: &lt;span class="pl-c1"&gt;3906&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token_str'&lt;/span&gt;: &lt;span class="pl-s"&gt;' office'&lt;/span&gt;},
 {&lt;span class="pl-s"&gt;'score'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.0277099609375&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'sequence'&lt;/span&gt;: &lt;span class="pl-s"&gt;'He walked to the library.'&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token'&lt;/span&gt;: &lt;span class="pl-c1"&gt;6335&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token_str'&lt;/span&gt;: &lt;span class="pl-s"&gt;' library'&lt;/span&gt;},
 {&lt;span class="pl-s"&gt;'score'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.0216064453125&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'sequence'&lt;/span&gt;: &lt;span class="pl-s"&gt;'He walked to the gate.'&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token'&lt;/span&gt;: &lt;span class="pl-c1"&gt;7394&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token_str'&lt;/span&gt;: &lt;span class="pl-s"&gt;' gate'&lt;/span&gt;},
 {&lt;span class="pl-s"&gt;'score'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.020263671875&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'sequence'&lt;/span&gt;: &lt;span class="pl-s"&gt;'He walked to the window.'&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token'&lt;/span&gt;: &lt;span class="pl-c1"&gt;3497&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token_str'&lt;/span&gt;: &lt;span class="pl-s"&gt;' window'&lt;/span&gt;}]&lt;/pre&gt;

&lt;p&gt;I'm looking forward to trying out models that use ModernBERT as their base. The model release is accompanied by a paper (&lt;a href="https://arxiv.org/abs/2412.13663"&gt;Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference&lt;/a&gt;) and &lt;a href="https://huggingface.co/docs/transformers/main/en/model_doc/modernbert"&gt;new documentation&lt;/a&gt; for using it with the Transformers library.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://bsky.app/profile/benjaminwarner.dev/post/3ldur45oz322b"&gt;@benjaminwarner.dev&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bert"&gt;bert&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nlp"&gt;nlp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers"&gt;transformers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jeremy-howard"&gt;jeremy-howard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;&lt;/p&gt;



</summary><category term="bert"/><category term="nlp"/><category term="python"/><category term="transformers"/><category term="ai"/><category term="jeremy-howard"/><category term="hugging-face"/><category term="uv"/></entry><entry><title>Structured Generation w/ SmolLM2 running in browser &amp; WebGPU</title><link href="https://simonwillison.net/2024/Nov/29/structured-generation-smollm2-webgpu/#atom-tag" rel="alternate"/><published>2024-11-29T21:09:11+00:00</published><updated>2024-11-29T21:09:11+00:00</updated><id>https://simonwillison.net/2024/Nov/29/structured-generation-smollm2-webgpu/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/spaces/reach-vb/github-issue-generator-webgpu"&gt;Structured Generation w/ SmolLM2 running in browser &amp;amp; WebGPU&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Extraordinary demo by Vaibhav Srivastav (VB). Here's Hugging Face's &lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct"&gt;SmolLM2-1.7B-Instruct&lt;/a&gt; running directly in a web browser (using WebGPU, so requires Chrome &lt;a href="https://github.com/gpuweb/gpuweb/wiki/Implementation-Status"&gt;for the moment&lt;/a&gt;) demonstrating structured text extraction, converting a text description of an image into a structured GitHub issue defined using JSON schema.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Interface showing text input, a JSON schema, extracted JSON and a UI that demonstrates the structured resulting GitHub Issue" src="https://static.simonwillison.net/static/2024/github-issue-extract.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The page loads 924.8MB of model data (according to &lt;a href="https://gist.github.com/simonw/3ccba6256e95b59ea6a17509855830b4"&gt;this script to sum up files in window.caches&lt;/a&gt;) and performs everything in-browser. I did not know a model this small could produce such useful results.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/Vaibhavs10/github-issue-generator-webgpu/blob/main/src/index.js"&gt;the source code&lt;/a&gt; for the demo. It's around 200 lines of code, 50 of which are the JSON schema describing the data to be extracted.&lt;/p&gt;
&lt;p&gt;The real secret sauce here is &lt;a href="https://github.com/mlc-ai/web-llm"&gt;web-llm&lt;/a&gt; by MLC. This library has made loading and executing prompts through LLMs in the browser shockingly easy, and recently incorporated support for MLC's &lt;a href="https://xgrammar.mlc.ai/"&gt;XGrammar&lt;/a&gt; library (also available in Python) which implements both JSON schema and EBNF-based structured output guidance.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://bsky.app/profile/reach-vb.hf.co/post/3lc24bmj6fk2j"&gt;@reach-vb.hf.co&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlc"&gt;mlc&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/structured-extraction"&gt;structured-extraction&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="llms"/><category term="mlc"/><category term="hugging-face"/><category term="webgpu"/><category term="smollm"/><category term="structured-extraction"/></entry><entry><title>NuExtract 1.5</title><link href="https://simonwillison.net/2024/Nov/16/nuextract-15/#atom-tag" rel="alternate"/><published>2024-11-16T16:33:17+00:00</published><updated>2024-11-16T16:33:17+00:00</updated><id>https://simonwillison.net/2024/Nov/16/nuextract-15/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://numind.ai/blog/nuextract-1-5---multilingual-infinite-context-still-small-and-better-than-gpt-4o"&gt;NuExtract 1.5&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Structured extraction - where an LLM helps turn unstructured text (or image content) into structured data - remains one of the most directly useful applications of LLMs.&lt;/p&gt;
&lt;p&gt;NuExtract is a family of small models directly trained for this purpose (though text only at the moment) and released under the MIT license.&lt;/p&gt;
&lt;p&gt;It comes in a variety of shapes and sizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/numind/NuExtract-1.5"&gt;NuExtract-v1.5&lt;/a&gt; is a 3.8B parameter model fine-tuned on &lt;a href="https://huggingface.co/microsoft/Phi-3.5-mini-instruct"&gt;Phi-3.5-mini instruct&lt;/a&gt;. You can try this one out in &lt;a href="https://huggingface.co/spaces/numind/NuExtract-1.5"&gt;this playground&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/numind/NuExtract-1.5-tiny"&gt;NuExtract-tiny-v1.5&lt;/a&gt; is 494M parameters, fine-tuned on &lt;a href="https://huggingface.co/Qwen/Qwen2.5-0.5B"&gt;Qwen2.5-0.5B&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/numind/NuExtract-1.5-smol"&gt;NuExtract-1.5-smol&lt;/a&gt; is 1.7B parameters, fine-tuned on &lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B"&gt;SmolLM2-1.7B&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All three models were fine-tuned on NuMind's "private high-quality dataset". It's interesting to see a model family that uses one fine-tuning set against three completely different base models.&lt;/p&gt;
&lt;p&gt;Useful tip &lt;a href="https://twitter.com/sroecker/status/1857846899123827168"&gt;from Steffen Röcker&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Make sure to use it with low temperature, I've uploaded &lt;a href="https://ollama.com/sroecker/nuextract-tiny-v1.5"&gt;NuExtract-tiny-v1.5 to Ollama&lt;/a&gt; and set it to 0. With the Ollama default of 0.7 it started repeating the input text. It works really well despite being so smol.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/phi"&gt;phi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/structured-extraction"&gt;structured-extraction&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="hugging-face"/><category term="fine-tuning"/><category term="phi"/><category term="qwen"/><category term="smollm"/><category term="structured-extraction"/><category term="llm-release"/><category term="ai-in-china"/></entry><entry><title>Docling</title><link href="https://simonwillison.net/2024/Nov/3/docling/#atom-tag" rel="alternate"/><published>2024-11-03T04:57:56+00:00</published><updated>2024-11-03T04:57:56+00:00</updated><id>https://simonwillison.net/2024/Nov/3/docling/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://ds4sd.github.io/docling/"&gt;Docling&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
MIT licensed document extraction Python library from the Deep Search team at IBM, who released &lt;a href="https://ds4sd.github.io/docling/v2/#changes-in-docling-v2"&gt;Docling v2&lt;/a&gt; on October 16th.&lt;/p&gt;
&lt;p&gt;Here's the &lt;a href="https://arxiv.org/abs/2408.09869"&gt;Docling Technical Report&lt;/a&gt; paper from August, which provides details of two custom models: a layout analysis model for figuring out the structure of the document (sections, figures, text, tables etc) and a TableFormer model specifically for extracting structured data from tables.&lt;/p&gt;
&lt;p&gt;Those models are &lt;a href="https://huggingface.co/ds4sd/docling-models"&gt;available on Hugging Face&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's how to try out the Docling CLI interface using &lt;code&gt;uvx&lt;/code&gt; (avoiding the need to install it first - though since it downloads models it will take a while to run the first time):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uvx docling mydoc.pdf --to json --to md
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will output a &lt;code&gt;mydoc.json&lt;/code&gt; file with complex layout information and a &lt;code&gt;mydoc.md&lt;/code&gt; Markdown file which includes Markdown tables where appropriate.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://ds4sd.github.io/docling/usage/"&gt;Python API&lt;/a&gt; is a lot more comprehensive. It can even extract tables &lt;a href="https://ds4sd.github.io/docling/examples/export_tables/"&gt;as Pandas DataFrames&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;docling&lt;/span&gt;.&lt;span class="pl-s1"&gt;document_converter&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;DocumentConverter&lt;/span&gt;
&lt;span class="pl-s1"&gt;converter&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;DocumentConverter&lt;/span&gt;()
&lt;span class="pl-s1"&gt;result&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;converter&lt;/span&gt;.&lt;span class="pl-en"&gt;convert&lt;/span&gt;(&lt;span class="pl-s"&gt;"document.pdf"&lt;/span&gt;)
&lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;table&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;result&lt;/span&gt;.&lt;span class="pl-s1"&gt;document&lt;/span&gt;.&lt;span class="pl-s1"&gt;tables&lt;/span&gt;:
    &lt;span class="pl-s1"&gt;df&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;table&lt;/span&gt;.&lt;span class="pl-en"&gt;export_to_dataframe&lt;/span&gt;()
    &lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s1"&gt;df&lt;/span&gt;)&lt;/pre&gt;

&lt;p&gt;I ran that inside &lt;code&gt;uv run --with docling python&lt;/code&gt;. It took a little while to run, but it demonstrated that the library works.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ibm"&gt;ibm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ocr"&gt;ocr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pdf"&gt;pdf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;&lt;/p&gt;



</summary><category term="cli"/><category term="ibm"/><category term="ocr"/><category term="pdf"/><category term="python"/><category term="ai"/><category term="hugging-face"/><category term="uv"/></entry><entry><title>SmolLM2</title><link href="https://simonwillison.net/2024/Nov/2/smollm2/#atom-tag" rel="alternate"/><published>2024-11-02T05:27:25+00:00</published><updated>2024-11-02T05:27:25+00:00</updated><id>https://simonwillison.net/2024/Nov/2/smollm2/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct"&gt;SmolLM2&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New from &lt;a href="https://loubnabnl.github.io/"&gt;Loubna Ben Allal&lt;/a&gt; and her research team at Hugging Face:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. [...]&lt;/p&gt;
&lt;p&gt;It was trained on 11 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new mathematics and coding datasets that we curated and will release soon.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The model weights are released under an Apache 2 license. I've been trying these out using my &lt;a href="https://github.com/simonw/llm-gguf"&gt;llm-gguf&lt;/a&gt; plugin for &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; and my first impressions are really positive.&lt;/p&gt;
&lt;p&gt;Here's a recipe to run a 1.7GB Q8 quantized model &lt;a href="https://huggingface.co/lmstudio-community/SmolLM2-1.7B-Instruct-GGUF"&gt;from lmstudio-community&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install llm-gguf
llm gguf download-model https://huggingface.co/lmstudio-community/SmolLM2-1.7B-Instruct-GGUF/resolve/main/SmolLM2-1.7B-Instruct-Q8_0.gguf -a smol17
llm chat -m smol17
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img alt="Animated terminal demo. My prompt is tell me about pelicans. The model responds: Sure, I'd be happy to tell you about pelicans! Pelicans are a group of aquatic birds in the order Pelecaniformes, which also includes the cormorants, darters, and frigatebirds. They are found on all continents except Antarctica, and are known for their distinctive pouch-like bill. There are several species of pelicans. The most common species is the Brown Pelican, which is found in the Americas. It's the only species that plunges into water from a significant height to catch fish and other prey, a behavior known as &amp;quot;fish-grabbing.&amp;quot;  Another common species is the American White Pelican, which can be found in both the Americas and Eurasia. It has a white plumage and a large, bright pink bill, and feeds on fish in lakes, rivers, and coastal wetlands.  Pelicans are generally medium-sized birds, but the Brown Pelican is the largest, with an average height of around 26-30 inches. Their bills can be as long as 11 inches!  Below the terminal you can see Activity Monitor showing 378% CPU usage for the Python process" src="https://static.simonwillison.net/static/2024/smol-demo.gif" /&gt;&lt;/p&gt;
&lt;p&gt;Or at the other end of the scale, here's how to run the 138MB &lt;a href="https://huggingface.co/lmstudio-community/SmolLM2-135M-Instruct-GGUF"&gt;Q8 quantized 135M model&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm gguf download-model https://huggingface.co/lmstudio-community/SmolLM2-135M-Instruct-GGUF/resolve/main/SmolLM2-135M-Instruct-Q8_0.gguf' -a smol135m
llm chat -m smol135m
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The blog entry to accompany SmolLM2 should be coming soon, but in the meantime here's the entry from July introducing the first version: &lt;a href="https://huggingface.co/blog/smollm"&gt; SmolLM - blazingly fast and remarkably powerful &lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/LoubnaBenAllal1/status/1852055582494294414"&gt;@LoubnaBenAllal1&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="hugging-face"/><category term="llm"/><category term="smollm"/><category term="llm-release"/></entry><entry><title>Hugging Face Hub: Configure progress bars</title><link href="https://simonwillison.net/2024/Oct/28/hugging-face-hub-progress-bars/#atom-tag" rel="alternate"/><published>2024-10-28T06:22:43+00:00</published><updated>2024-10-28T06:22:43+00:00</updated><id>https://simonwillison.net/2024/Oct/28/hugging-face-hub-progress-bars/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/docs/huggingface_hub/en/package_reference/utilities#configure-progress-bars"&gt;Hugging Face Hub: Configure progress bars&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This has been driving me a little bit spare. Every time I try and build anything against a library that uses &lt;code&gt;huggingface_hub&lt;/code&gt; somewhere under the hood to access models (most recently trying out &lt;a href="https://github.com/Blaizzy/mlx-vlm"&gt;MLX-VLM&lt;/a&gt;) I inevitably get output like this every single time I execute the model:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Fetching 11 files: 100%|██████████████████| 11/11 [00:00&amp;lt;00:00, 15871.12it/s]&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;I &lt;em&gt;finally&lt;/em&gt; tracked down a solution, after many &lt;code&gt;breakpoint()&lt;/code&gt; interceptions. You can fix it like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;huggingface_hub&lt;/span&gt;.&lt;span class="pl-s1"&gt;utils&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;disable_progress_bars&lt;/span&gt;
&lt;span class="pl-en"&gt;disable_progress_bars&lt;/span&gt;()&lt;/pre&gt;
&lt;p&gt;Or by setting the &lt;code&gt;HF_HUB_DISABLE_PROGRESS_BARS&lt;/code&gt; environment variable, which in Python code looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;os&lt;/span&gt;.&lt;span class="pl-s1"&gt;environ&lt;/span&gt;[&lt;span class="pl-s"&gt;"HF_HUB_DISABLE_PROGRESS_BARS"&lt;/span&gt;] &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;'1'&lt;/span&gt;&lt;/pre&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;&lt;/p&gt;



</summary><category term="python"/><category term="llms"/><category term="hugging-face"/></entry><entry><title>Quoting Julien Chaumond</title><link href="https://simonwillison.net/2024/Apr/8/julien-chaumond/#atom-tag" rel="alternate"/><published>2024-04-08T18:35:30+00:00</published><updated>2024-04-08T18:35:30+00:00</updated><id>https://simonwillison.net/2024/Apr/8/julien-chaumond/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/julien_c/status/1777328846829679072"&gt;&lt;p&gt;in July 2023, we [Hugging Face] wanted to experiment with a custom license for this specific project [text-generation-inference] in order to protect our commercial solutions from companies with bigger means than we do, who would just host an exact copy of our cloud services.&lt;/p&gt;
&lt;p&gt;The experiment however wasn't successful.&lt;/p&gt;
&lt;p&gt;It did not lead to licensing-specific incremental business opportunities by itself, while it did hamper or at least complicate the community contributions, given the legal uncertainty that arises as soon as you deviate from the standard licenses.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/julien_c/status/1777328846829679072"&gt;Julien Chaumond&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="hugging-face"/></entry><entry><title>Weird A.I. Yankovic, a cursed deep dive into the world of voice cloning</title><link href="https://simonwillison.net/2023/Oct/2/the-world-of-voice-cloning/#atom-tag" rel="alternate"/><published>2023-10-02T18:50:38+00:00</published><updated>2023-10-02T18:50:38+00:00</updated><id>https://simonwillison.net/2023/Oct/2/the-world-of-voice-cloning/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://waxy.org/2023/10/weird-ai-yankovic-voice-cloning/"&gt;Weird A.I. Yankovic, a cursed deep dive into the world of voice cloning&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Andy Baio reports back on his investigations into the world of AI voice cloning.&lt;/p&gt;

&lt;p&gt;This is no longer a niche interest. There’s a Discord with 500,000 members sharing tips and tricks on cloning celebrity voices in order to make their own cover songs, often built with Google Colab using models distributed through Hugging Face.&lt;/p&gt;

&lt;p&gt;Andy then makes his own, playing with the concept “What if every Weird Al song was the original, and every other artist was covering his songs instead?”&lt;/p&gt;

&lt;p&gt;I particularly enjoyed Madonna’s cover of “Like A Surgeon”, Lady Gaga’s “Perform This Way” and Lorde’s “Foil”.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/andy-baio"&gt;andy-baio&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/audio"&gt;audio&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;&lt;/p&gt;



</summary><category term="andy-baio"/><category term="audio"/><category term="ai"/><category term="generative-ai"/><category term="hugging-face"/></entry><entry><title>All models on Hugging Face, sorted by downloads</title><link href="https://simonwillison.net/2023/Sep/10/all-models-on-hugging-face/#atom-tag" rel="alternate"/><published>2023-09-10T17:24:42+00:00</published><updated>2023-09-10T17:24:42+00:00</updated><id>https://simonwillison.net/2023/Sep/10/all-models-on-hugging-face/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/models?sort=downloads"&gt;All models on Hugging Face, sorted by downloads&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I realized this morning that “sort by downloads” against the list of all of the models on Hugging Face can work as a reasonably good proxy for “which of these models are easiest to get running on your own computer”.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://fedi.simonwillison.net/@simon/111041868759156076"&gt;@simon&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="ai"/><category term="hugging-face"/></entry><entry><title>Hugging Face Transformers Agent</title><link href="https://simonwillison.net/2023/May/10/transformers-agent/#atom-tag" rel="alternate"/><published>2023-05-10T19:50:06+00:00</published><updated>2023-05-10T19:50:06+00:00</updated><id>https://simonwillison.net/2023/May/10/transformers-agent/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/docs/transformers/en/transformers_agents"&gt;Hugging Face Transformers Agent&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Fascinating new Python API in Hugging Face Transformers version v4.29.0: you can now provide a text description of a task—e.g. “Draw me a picture of the sea then transform the picture to add an island”—and a LLM will turn that into calls to Hugging Face models which will then be installed and used to carry out the instructions. The Colab notebook is worth playing with—you paste in an OpenAI API key and a Hugging Face token and it can then run through all sorts of examples, which tap into tools that include image generation, image modification, summarization, audio generation and more.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="hugging-face"/></entry><entry><title>Jsonformer: A Bulletproof Way to Generate Structured JSON from Language Models</title><link href="https://simonwillison.net/2023/May/8/jsonformer/#atom-tag" rel="alternate"/><published>2023-05-08T23:02:01+00:00</published><updated>2023-05-08T23:02:01+00:00</updated><id>https://simonwillison.net/2023/May/8/jsonformer/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/1rgs/jsonformer"&gt;Jsonformer: A Bulletproof Way to Generate Structured JSON from Language Models&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This is such an interesting trick. A common challenge with LLMs is getting them to output a specific JSON shape of data reliably, without occasionally messing up and generating invalid JSON or outputting other text.&lt;/p&gt;

&lt;p&gt;Jsonformer addresses this in a truly ingenious way: it implements code that interacts with the logic that decides which token to output next, influenced by a JSON schema. If that code knows that the next token after a double quote should be a comma it can force the issue for that specific token.&lt;/p&gt;

&lt;p&gt;This means you can get reliable, robust JSON output even for much smaller, less capable language models.&lt;/p&gt;

&lt;p&gt;It’s built against Hugging Face transformers, but there’s no reason the same idea couldn’t be applied in other contexts as well.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/json"&gt;json&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;&lt;/p&gt;



</summary><category term="json"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="hugging-face"/></entry><entry><title>Transformers.js</title><link href="https://simonwillison.net/2023/Mar/16/transformers-js/#atom-tag" rel="alternate"/><published>2023-03-16T23:41:55+00:00</published><updated>2023-03-16T23:41:55+00:00</updated><id>https://simonwillison.net/2023/Mar/16/transformers-js/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://xenova.github.io/transformers.js/"&gt;Transformers.js&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Hugging Face Transformers is a library of Transformer machine learning models plus a Python package for loading and running them. Transformers.js provides a JavaScript alternative interface which runs in your browser, thanks to a set of precompiled WebAssembly binaries for a selection of models. This interactive demo is incredible: in particular, try running the Image classification with google/vit-base-patch16-224 (91MB) model against any photo to get back labels representing that photo. Dropping one of these models onto a page is as easy as linking to a hosted CDN script and running a few lines of JavaScript.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers"&gt;transformers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers-js"&gt;transformers-js&lt;/a&gt;&lt;/p&gt;



</summary><category term="javascript"/><category term="machine-learning"/><category term="transformers"/><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="llms"/><category term="hugging-face"/><category term="transformers-js"/></entry></feed>