<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: smollm</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/smollm.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-02-07T06:34:59+00:00</updated><author><name>Simon Willison</name></author><entry><title>Using pip to install a Large Language Model that's under 100MB</title><link href="https://simonwillison.net/2025/Feb/7/pip-install-llm-smollm2/#atom-tag" rel="alternate"/><published>2025-02-07T06:34:59+00:00</published><updated>2025-02-07T06:34:59+00:00</updated><id>https://simonwillison.net/2025/Feb/7/pip-install-llm-smollm2/#atom-tag</id><summary type="html">
    &lt;p&gt;I just released &lt;a href="https://github.com/simonw/llm-smollm2"&gt;llm-smollm2&lt;/a&gt;, a new plugin for &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; that bundles a quantized copy of the &lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct"&gt;SmolLM2-135M-Instruct&lt;/a&gt; LLM inside of the Python package.&lt;/p&gt;
&lt;p&gt;This means you can now &lt;code&gt;pip install&lt;/code&gt; a full LLM!&lt;/p&gt;

&lt;p&gt;If you're already using &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; you can install it like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-smollm2&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then run prompts like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm -m SmolLM2 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Are dogs real?&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;(New favourite test prompt for tiny models, courtesy of &lt;a href="https://bsky.app/profile/timfduffy.com/post/3lhknvwijps2o"&gt;Tim Duffy&lt;/a&gt;. &lt;a href="https://gist.github.com/simonw/a71d83cd8c6caa49cec4a9f296c267df"&gt;Here's the result&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;If you don't have LLM yet first &lt;a href="https://llm.datasette.io/en/stable/setup.html"&gt;follow these installation instructions&lt;/a&gt;, or &lt;code&gt;brew install llm&lt;/code&gt; or &lt;code&gt;pipx install llm&lt;/code&gt; or &lt;code&gt;uv tool install llm&lt;/code&gt; depending on your preferred way of getting your Python tools.&lt;/p&gt;
&lt;p&gt;If you have &lt;a href="https://github.com/astral-sh/uv"&gt;uv&lt;/a&gt; setup you don't need to install anything at all! The following command will spin up an ephemeral environment, install the necessary packages and start a chat session with the model all in one go:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uvx --with llm-smollm2 llm chat -m SmolLM2&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/smol.gif" alt="$ uvx --with llm-smollm2 llm chat -m SmolLM2
Chatting with SmolLM2
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
&gt; tell me a long story about a dog
The story of a dog is a tale of love, loss, and the unspoken rules of the universe. It begins with a small, timid, and timid dog named Max. Max was born into a loving family, but his life was forever changed by a cruel and cruel owner. He was left to fend for himself, and he was left to fend for himself.

Max's owner, a kind and gentle soul named Elena, was a cruel and cruel one. She had no patience, no love, and she would do anything to keep Max away from her. Max was left to fend for himself, and he was left to fend for himself.

Max's owner, Elena, was determined to make Max happy. She took Max to a beautiful and beautiful place, where Max was allowed to roam freely. Max was allowed to roam freely, and Max was allowed to roam freely. [Then repeats that sentence many times]" style="max-width: 100%;" /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/7/pip-install-llm-smollm2/#finding-a-tiny-model"&gt;Finding a tiny model&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/7/pip-install-llm-smollm2/#building-the-plugin"&gt;Building the plugin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/7/pip-install-llm-smollm2/#packaging-the-plugin"&gt;Packaging the plugin&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/7/pip-install-llm-smollm2/#publishing-to-pypi"&gt;Publishing to PyPI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/7/pip-install-llm-smollm2/#is-the-model-any-good-"&gt;Is the model any good?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="finding-a-tiny-model"&gt;Finding a tiny model&lt;/h4&gt;
&lt;p&gt;The fact that the model is almost exactly 100MB is no coincidence: that's the &lt;a href="https://pypi.org/help/#file-size-limit"&gt;default size limit&lt;/a&gt; for a Python package that can be uploaded to the Python Package Index (PyPI).&lt;/p&gt;
&lt;p&gt;I &lt;a href="https://bsky.app/profile/simonwillison.net/post/3lhklqd62jc2x"&gt;asked on Bluesky&lt;/a&gt; if anyone had seen a just-about-usable GGUF model that was under 100MB, and Artisan Loaf &lt;a href="https://bsky.app/profile/artisanloaf.bsky.social/post/3lhklumfhvs2r"&gt;pointed me&lt;/a&gt; to &lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct"&gt;SmolLM2-135M-Instruct&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I ended up using &lt;a href="https://huggingface.co/QuantFactory/SmolLM2-135M-Instruct-GGUF/tree/main"&gt;this quantization&lt;/a&gt; by &lt;a href="https://huggingface.co/QuantFactory"&gt;QuantFactory&lt;/a&gt; just because it was the first sub-100MB model I tried that worked.&lt;/p&gt;
&lt;p&gt;Trick for finding quantized models: Hugging Face has a neat "model tree" feature in the side panel of their model pages, which includes links to relevant quantized models. I find most of my GGUFs using that feature.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/hugging-face-model-tree.jpg" alt="Model tree for HuggingFaceTB/SmolLM2-135M-Instruct. 60 Quantizations, 6 adapters, 80 finetunes, 1 merge." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="building-the-plugin"&gt;Building the plugin&lt;/h4&gt;
&lt;p&gt;I first tried the model out using Python and the &lt;a href="https://github.com/abetlen/llama-cpp-python"&gt;llama-cpp-python&lt;/a&gt; library like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uv run --with llama-cpp-python python&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;llama_cpp&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;Llama&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;pprint&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;pprint&lt;/span&gt;
&lt;span class="pl-s1"&gt;llm&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;Llama&lt;/span&gt;(&lt;span class="pl-s1"&gt;model_path&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"SmolLM2-135M-Instruct.Q4_1.gguf"&lt;/span&gt;)
&lt;span class="pl-s1"&gt;output&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;llm&lt;/span&gt;.&lt;span class="pl-c1"&gt;create_chat_completion&lt;/span&gt;(&lt;span class="pl-s1"&gt;messages&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;[
    {&lt;span class="pl-s"&gt;"role"&lt;/span&gt;: &lt;span class="pl-s"&gt;"user"&lt;/span&gt;, &lt;span class="pl-s"&gt;"content"&lt;/span&gt;: &lt;span class="pl-s"&gt;"Hi"&lt;/span&gt;}
])
&lt;span class="pl-en"&gt;pprint&lt;/span&gt;(&lt;span class="pl-s1"&gt;output&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;This gave me the output I was expecting:&lt;/p&gt;
&lt;pre&gt;{&lt;span class="pl-s"&gt;'choices'&lt;/span&gt;: [{&lt;span class="pl-s"&gt;'finish_reason'&lt;/span&gt;: &lt;span class="pl-s"&gt;'stop'&lt;/span&gt;,
              &lt;span class="pl-s"&gt;'index'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0&lt;/span&gt;,
              &lt;span class="pl-s"&gt;'logprobs'&lt;/span&gt;: &lt;span class="pl-c1"&gt;None&lt;/span&gt;,
              &lt;span class="pl-s"&gt;'message'&lt;/span&gt;: {&lt;span class="pl-s"&gt;'content'&lt;/span&gt;: &lt;span class="pl-s"&gt;'Hello! How can I assist you today?'&lt;/span&gt;,
                          &lt;span class="pl-s"&gt;'role'&lt;/span&gt;: &lt;span class="pl-s"&gt;'assistant'&lt;/span&gt;}}],
 &lt;span class="pl-s"&gt;'created'&lt;/span&gt;: &lt;span class="pl-c1"&gt;1738903256&lt;/span&gt;,
 &lt;span class="pl-s"&gt;'id'&lt;/span&gt;: &lt;span class="pl-s"&gt;'chatcmpl-76ea1733-cc2f-46d4-9939-90efa2a05e7c'&lt;/span&gt;,
 &lt;span class="pl-s"&gt;'model'&lt;/span&gt;: &lt;span class="pl-s"&gt;'SmolLM2-135M-Instruct.Q4_1.gguf'&lt;/span&gt;,
 &lt;span class="pl-s"&gt;'object'&lt;/span&gt;: &lt;span class="pl-s"&gt;'chat.completion'&lt;/span&gt;,
 &lt;span class="pl-s"&gt;'usage'&lt;/span&gt;: {&lt;span class="pl-s"&gt;'completion_tokens'&lt;/span&gt;: &lt;span class="pl-c1"&gt;9&lt;/span&gt;, &lt;span class="pl-s"&gt;'prompt_tokens'&lt;/span&gt;: &lt;span class="pl-c1"&gt;31&lt;/span&gt;, &lt;span class="pl-s"&gt;'total_tokens'&lt;/span&gt;: &lt;span class="pl-c1"&gt;40&lt;/span&gt;}}&lt;/pre&gt;
&lt;p&gt;But it also &lt;em&gt;spammed&lt;/em&gt; my terminal with a huge volume of debugging output - which started like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llama_model_load_from_file_impl: using device Metal (Apple M2 Max) - 49151 MiB free
llama_model_loader: loaded meta data with 33 key-value pairs and 272 tensors from SmolLM2-135M-Instruct.Q4_1.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then continued for more than &lt;a href="https://gist.github.com/simonw/9ef7acd836b1cc40c14686eae4dca340"&gt;500 lines&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;I've had this problem with &lt;code&gt;llama-cpp-python&lt;/code&gt; and &lt;code&gt;llama.cpp&lt;/code&gt; in the past, and was sad to find that the documentation still doesn't have a great answer for how to avoid this.&lt;/p&gt;
&lt;p&gt;So I turned to the just released &lt;a href="https://simonwillison.net/2025/Feb/5/gemini-2/"&gt;Gemini 2.0 Pro (Experimental)&lt;/a&gt;, because I know it's a strong model with a long input limit.&lt;/p&gt;
&lt;p&gt;I ran the entire &lt;code&gt;llama-cpp-python&lt;/code&gt; codebase through it like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-c1"&gt;cd&lt;/span&gt; /tmp
git clone https://github.com/abetlen/llama-cpp-python
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; llama-cpp-python
files-to-prompt -e py &lt;span class="pl-c1"&gt;.&lt;/span&gt; -c &lt;span class="pl-k"&gt;|&lt;/span&gt; llm -m gemini-2.0-pro-exp-02-05 \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;How can I prevent this library from logging any information at all while it is running - no stderr or anything like that&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/20476c2c6f7604df2994212cebfafef4#response"&gt;the answer I got back&lt;/a&gt;. It recommended setting the logger to &lt;code&gt;logging.CRITICAL&lt;/code&gt;, passing &lt;code&gt;verbose=False&lt;/code&gt; to the constructor and, most importantly, using the following context manager to suppress all output:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;contextlib&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;contextmanager&lt;/span&gt;, &lt;span class="pl-s1"&gt;redirect_stderr&lt;/span&gt;, &lt;span class="pl-s1"&gt;redirect_stdout&lt;/span&gt;

&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;contextmanager&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;suppress_output&lt;/span&gt;():
    &lt;span class="pl-s"&gt;"""&lt;/span&gt;
&lt;span class="pl-s"&gt;    Suppresses all stdout and stderr output within the context.&lt;/span&gt;
&lt;span class="pl-s"&gt;    """&lt;/span&gt;
    &lt;span class="pl-k"&gt;with&lt;/span&gt; &lt;span class="pl-en"&gt;open&lt;/span&gt;(&lt;span class="pl-s1"&gt;os&lt;/span&gt;.&lt;span class="pl-c1"&gt;devnull&lt;/span&gt;, &lt;span class="pl-s"&gt;"w"&lt;/span&gt;) &lt;span class="pl-k"&gt;as&lt;/span&gt; &lt;span class="pl-s1"&gt;devnull&lt;/span&gt;:
        &lt;span class="pl-k"&gt;with&lt;/span&gt; &lt;span class="pl-en"&gt;redirect_stdout&lt;/span&gt;(&lt;span class="pl-s1"&gt;devnull&lt;/span&gt;), &lt;span class="pl-en"&gt;redirect_stderr&lt;/span&gt;(&lt;span class="pl-s1"&gt;devnull&lt;/span&gt;):
            &lt;span class="pl-k"&gt;yield&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;This worked! It turned out most of the output came from initializing the &lt;code&gt;LLM&lt;/code&gt; class, so I wrapped that like so:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;with&lt;/span&gt; &lt;span class="pl-en"&gt;suppress_output&lt;/span&gt;():
    &lt;span class="pl-s1"&gt;model&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;Llama&lt;/span&gt;(&lt;span class="pl-s1"&gt;model_path&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;self&lt;/span&gt;.&lt;span class="pl-c1"&gt;model_path&lt;/span&gt;, &lt;span class="pl-s1"&gt;verbose&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;False&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;Proof of concept in hand I set about writing the plugin. I started with my &lt;a href="https://github.com/simonw/llm-plugin"&gt;simonw/llm-plugin&lt;/a&gt; cookiecutter template:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uvx cookiecutter gh:simonw/llm-plugin&lt;/pre&gt;&lt;/div&gt;
&lt;pre&gt;&lt;code&gt;  [1/6] plugin_name (): smollm2
  [2/6] description (): SmolLM2-135M-Instruct.Q4_1 for LLM
  [3/6] hyphenated (smollm2): 
  [4/6] underscored (smollm2): 
  [5/6] github_username (): simonw
  [6/6] author_name (): Simon Willison
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;a href="https://github.com/simonw/llm-smollm2/blob/0.1.1/llm_smollm2/__init__.py"&gt;rest of the plugin&lt;/a&gt; was mostly borrowed from my existing &lt;a href="https://github.com/simonw/llm-gguf/blob/0.2/llm_gguf.py"&gt;llm-gguf&lt;/a&gt; plugin, updated based on the latest README for the &lt;code&gt;llama-cpp-python&lt;/code&gt; project.&lt;/p&gt;
&lt;p&gt;There's more information on building plugins in &lt;a href="https://llm.datasette.io/en/stable/plugins/tutorial-model-plugin.html"&gt;the tutorial on writing a plugin&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="packaging-the-plugin"&gt;Packaging the plugin&lt;/h4&gt;
&lt;p&gt;Once I had that working the last step was to figure out how to package it for PyPI. I'm never quite sure of the best way to bundle a binary file in a Python package, especially one that uses a &lt;code&gt;pyproject.toml&lt;/code&gt; file... so I dumped a copy of my existing &lt;code&gt;pyproject.toml&lt;/code&gt; file into o3-mini-high and prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Modify this to bundle a SmolLM2-135M-Instruct.Q4_1.gguf file inside the package. I don't want to use hatch or a manifest or anything, I just want to use setuptools.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://chatgpt.com/share/67a59122-67c8-8006-9be4-29f8419343ad"&gt;the shared transcript&lt;/a&gt; - it gave me exactly what I wanted. I bundled it by adding this to the end of the &lt;code&gt;toml&lt;/code&gt; file:&lt;/p&gt;
&lt;div class="highlight highlight-source-toml"&gt;&lt;pre&gt;[&lt;span class="pl-en"&gt;tool&lt;/span&gt;.&lt;span class="pl-en"&gt;setuptools&lt;/span&gt;.&lt;span class="pl-en"&gt;package-data&lt;/span&gt;]
&lt;span class="pl-smi"&gt;llm_smollm2&lt;/span&gt; = [&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;SmolLM2-135M-Instruct.Q4_1.gguf&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;]&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then dropping that &lt;code&gt;.gguf&lt;/code&gt; file into the &lt;code&gt;llm_smollm2/&lt;/code&gt; directory and putting my plugin code in &lt;code&gt;llm_smollm2/__init__.py&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I tested it locally by running this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;python -m pip install build
python -m build&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I fired up a fresh virtual environment and ran &lt;code&gt;pip install ../path/to/llm-smollm2/dist/llm_smollm2-0.1-py3-none-any.whl&lt;/code&gt; to confirm that the package worked as expected.&lt;/p&gt;
&lt;h4 id="publishing-to-pypi"&gt;Publishing to PyPI&lt;/h4&gt;
&lt;p&gt;My cookiecutter template comes with &lt;a href="https://github.com/simonw/llm-smollm2/blob/main/.github/workflows/publish.yml"&gt;a GitHub Actions workflow&lt;/a&gt; that publishes the package to PyPI when a new release is created using the GitHub web interface. Here's the relevant YAML:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;  &lt;span class="pl-ent"&gt;deploy&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;runs-on&lt;/span&gt;: &lt;span class="pl-s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="pl-ent"&gt;needs&lt;/span&gt;: &lt;span class="pl-s"&gt;[test]&lt;/span&gt;
    &lt;span class="pl-ent"&gt;environment&lt;/span&gt;: &lt;span class="pl-s"&gt;release&lt;/span&gt;
    &lt;span class="pl-ent"&gt;permissions&lt;/span&gt;:
      &lt;span class="pl-ent"&gt;id-token&lt;/span&gt;: &lt;span class="pl-s"&gt;write&lt;/span&gt;
    &lt;span class="pl-ent"&gt;steps&lt;/span&gt;:
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/checkout@v4&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Set up Python&lt;/span&gt;
      &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/setup-python@v5&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;python-version&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;3.13&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
        &lt;span class="pl-ent"&gt;cache&lt;/span&gt;: &lt;span class="pl-s"&gt;pip&lt;/span&gt;
        &lt;span class="pl-ent"&gt;cache-dependency-path&lt;/span&gt;: &lt;span class="pl-s"&gt;pyproject.toml&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Install dependencies&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        pip install setuptools wheel build&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Build&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        python -m build&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Publish&lt;/span&gt;
      &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;pypa/gh-action-pypi-publish@release/v1&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This runs after the &lt;code&gt;test&lt;/code&gt; job has passed. It uses the &lt;a href="https://github.com/pypa/gh-action-pypi-publish"&gt;pypa/gh-action-pypi-publish&lt;/a&gt; Action to publish to PyPI - I wrote more about how that works &lt;a href="https://til.simonwillison.net/pypi/pypi-releases-from-github"&gt;in this TIL&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="is-the-model-any-good-"&gt;Is the model any good?&lt;/h4&gt;
&lt;p&gt;This one really isn't! It's not really surprising but it turns out 94MB really isn't enough space for a model that can do anything useful.&lt;/p&gt;
&lt;p&gt;It's &lt;em&gt;super&lt;/em&gt; fun to play with, and I continue to maintain that small, weak models are a great way to help build a mental model of how this technology actually works.&lt;/p&gt;
&lt;p&gt;That's not to say SmolLM2 isn't a fantastic model family. I'm running the smallest, most restricted version here. &lt;a href="https://huggingface.co/blog/smollm"&gt;SmolLM - blazingly fast and remarkably powerful&lt;/a&gt; describes the full model family - which comes in 135M, 360M, and 1.7B sizes. The larger versions are a whole lot more capable.&lt;/p&gt;
&lt;p&gt;If anyone can figure out something genuinely useful to do with the 94MB version I'd love to hear about it.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/o3"&gt;o3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pip"&gt;pip&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pypi"&gt;pypi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama-cpp"&gt;llama-cpp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="o3"/><category term="pip"/><category term="plugins"/><category term="pypi"/><category term="llm"/><category term="projects"/><category term="llms"/><category term="python"/><category term="smollm"/><category term="llama-cpp"/><category term="github-actions"/><category term="ai"/><category term="local-llms"/><category term="uv"/><category term="gemini"/><category term="generative-ai"/><category term="ai-assisted-programming"/></entry><entry><title>Structured Generation w/ SmolLM2 running in browser &amp; WebGPU</title><link href="https://simonwillison.net/2024/Nov/29/structured-generation-smollm2-webgpu/#atom-tag" rel="alternate"/><published>2024-11-29T21:09:11+00:00</published><updated>2024-11-29T21:09:11+00:00</updated><id>https://simonwillison.net/2024/Nov/29/structured-generation-smollm2-webgpu/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/spaces/reach-vb/github-issue-generator-webgpu"&gt;Structured Generation w/ SmolLM2 running in browser &amp;amp; WebGPU&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Extraordinary demo by Vaibhav Srivastav (VB). Here's Hugging Face's &lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct"&gt;SmolLM2-1.7B-Instruct&lt;/a&gt; running directly in a web browser (using WebGPU, so requires Chrome &lt;a href="https://github.com/gpuweb/gpuweb/wiki/Implementation-Status"&gt;for the moment&lt;/a&gt;) demonstrating structured text extraction, converting a text description of an image into a structured GitHub issue defined using JSON schema.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Interface showing text input, a JSON schema, extracted JSON and a UI that demonstrates the structured resulting GitHub Issue" src="https://static.simonwillison.net/static/2024/github-issue-extract.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The page loads 924.8MB of model data (according to &lt;a href="https://gist.github.com/simonw/3ccba6256e95b59ea6a17509855830b4"&gt;this script to sum up files in window.caches&lt;/a&gt;) and performs everything in-browser. I did not know a model this small could produce such useful results.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/Vaibhavs10/github-issue-generator-webgpu/blob/main/src/index.js"&gt;the source code&lt;/a&gt; for the demo. It's around 200 lines of code, 50 of which are the JSON schema describing the data to be extracted.&lt;/p&gt;
&lt;p&gt;The real secret sauce here is &lt;a href="https://github.com/mlc-ai/web-llm"&gt;web-llm&lt;/a&gt; by MLC. This library has made loading and executing prompts through LLMs in the browser shockingly easy, and recently incorporated support for MLC's &lt;a href="https://xgrammar.mlc.ai/"&gt;XGrammar&lt;/a&gt; library (also available in Python) which implements both JSON schema and EBNF-based structured output guidance.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://bsky.app/profile/reach-vb.hf.co/post/3lc24bmj6fk2j"&gt;@reach-vb.hf.co&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlc"&gt;mlc&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webgpu"&gt;webgpu&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/structured-extraction"&gt;structured-extraction&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="llms"/><category term="mlc"/><category term="hugging-face"/><category term="webgpu"/><category term="smollm"/><category term="structured-extraction"/></entry><entry><title>SmolVLM - small yet mighty Vision Language Model</title><link href="https://simonwillison.net/2024/Nov/28/smolvlm/#atom-tag" rel="alternate"/><published>2024-11-28T20:29:27+00:00</published><updated>2024-11-28T20:29:27+00:00</updated><id>https://simonwillison.net/2024/Nov/28/smolvlm/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/blog/smolvlm"&gt;SmolVLM - small yet mighty Vision Language Model&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I've been having fun playing with this new vision model from the Hugging Face team behind &lt;a href="https://simonwillison.net/2024/Nov/2/smollm2/"&gt;SmolLM&lt;/a&gt;. They describe it as:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] a 2B VLM, SOTA for its memory footprint. SmolVLM is small, fast, memory-efficient, and fully open-source. All model checkpoints, VLM datasets, training recipes and tools are released under the Apache 2.0 license.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've tried it in a few flavours but my favourite so far is the &lt;a href="https://github.com/Blaizzy/mlx-vlm"&gt;mlx-vlm&lt;/a&gt; approach, via &lt;code&gt;mlx-vlm&lt;/code&gt; author &lt;a href="https://twitter.com/Prince_Canuma/status/1862168514842280401"&gt;Prince Canuma&lt;/a&gt;. Here's the &lt;code&gt;uv&lt;/code&gt; recipe I'm using to run it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv run \
  --with mlx-vlm \
  --with torch \
  python -m mlx_vlm.generate \
    --model mlx-community/SmolVLM-Instruct-bf16 \
    --max-tokens 500 \
    --temp 0.5 \
    --prompt "Describe this image in detail" \
    --image IMG_4414.JPG
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you run into an error using Python 3.13 (torch compatibility) try &lt;code&gt;uv run --python 3.11&lt;/code&gt; instead.&lt;/p&gt;
&lt;p&gt;This one-liner installs the necessary dependencies, downloads the model (about 4.2GB, saved to &lt;code&gt;~/.cache/huggingface/hub/models--mlx-community--SmolVLM-Instruct-bf16&lt;/code&gt;) and executes the prompt and displays the result.&lt;/p&gt;
&lt;p&gt;I ran that against &lt;a href="https://static.simonwillison.net/static/2024/IMG_4414.JPG"&gt;this Pelican photo&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A glorious pelican on some rocks, two other pelicans are visible plus some other birds" src="https://static.simonwillison.net/static/2024/IMG_4414.JPG" /&gt;&lt;/p&gt;
&lt;p&gt;The model replied:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the foreground of this photograph, a pelican is perched on a pile of rocks. The pelican’s wings are spread out, and its beak is open. There is a small bird standing on the rocks in front of the pelican. The bird has its head cocked to one side, and it seems to be looking at the pelican. To the left of the pelican is another bird, and behind the pelican are some other birds. The rocks in the background of the image are gray, and they are covered with a variety of textures. The rocks in the background appear to be wet from either rain or sea spray.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There are a few spatial mistakes in that description but the vibes are generally in the right direction.&lt;/p&gt;
&lt;p&gt;On my 64GB M2 MacBook pro it read the prompt at 7.831 tokens/second and generated that response at an impressive 74.765 tokens/second.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mlx"&gt;mlx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prince-canuma"&gt;prince-canuma&lt;/a&gt;&lt;/p&gt;



</summary><category term="python"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="vision-llms"/><category term="uv"/><category term="mlx"/><category term="smollm"/><category term="llm-release"/><category term="prince-canuma"/></entry><entry><title>NuExtract 1.5</title><link href="https://simonwillison.net/2024/Nov/16/nuextract-15/#atom-tag" rel="alternate"/><published>2024-11-16T16:33:17+00:00</published><updated>2024-11-16T16:33:17+00:00</updated><id>https://simonwillison.net/2024/Nov/16/nuextract-15/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://numind.ai/blog/nuextract-1-5---multilingual-infinite-context-still-small-and-better-than-gpt-4o"&gt;NuExtract 1.5&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Structured extraction - where an LLM helps turn unstructured text (or image content) into structured data - remains one of the most directly useful applications of LLMs.&lt;/p&gt;
&lt;p&gt;NuExtract is a family of small models directly trained for this purpose (though text only at the moment) and released under the MIT license.&lt;/p&gt;
&lt;p&gt;It comes in a variety of shapes and sizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/numind/NuExtract-1.5"&gt;NuExtract-v1.5&lt;/a&gt; is a 3.8B parameter model fine-tuned on &lt;a href="https://huggingface.co/microsoft/Phi-3.5-mini-instruct"&gt;Phi-3.5-mini instruct&lt;/a&gt;. You can try this one out in &lt;a href="https://huggingface.co/spaces/numind/NuExtract-1.5"&gt;this playground&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/numind/NuExtract-1.5-tiny"&gt;NuExtract-tiny-v1.5&lt;/a&gt; is 494M parameters, fine-tuned on &lt;a href="https://huggingface.co/Qwen/Qwen2.5-0.5B"&gt;Qwen2.5-0.5B&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/numind/NuExtract-1.5-smol"&gt;NuExtract-1.5-smol&lt;/a&gt; is 1.7B parameters, fine-tuned on &lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B"&gt;SmolLM2-1.7B&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All three models were fine-tuned on NuMind's "private high-quality dataset". It's interesting to see a model family that uses one fine-tuning set against three completely different base models.&lt;/p&gt;
&lt;p&gt;Useful tip &lt;a href="https://twitter.com/sroecker/status/1857846899123827168"&gt;from Steffen Röcker&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Make sure to use it with low temperature, I've uploaded &lt;a href="https://ollama.com/sroecker/nuextract-tiny-v1.5"&gt;NuExtract-tiny-v1.5 to Ollama&lt;/a&gt; and set it to 0. With the Ollama default of 0.7 it started repeating the input text. It works really well despite being so smol.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/phi"&gt;phi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/structured-extraction"&gt;structured-extraction&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="hugging-face"/><category term="fine-tuning"/><category term="phi"/><category term="qwen"/><category term="smollm"/><category term="structured-extraction"/><category term="llm-release"/><category term="ai-in-china"/></entry><entry><title>SmolLM2</title><link href="https://simonwillison.net/2024/Nov/2/smollm2/#atom-tag" rel="alternate"/><published>2024-11-02T05:27:25+00:00</published><updated>2024-11-02T05:27:25+00:00</updated><id>https://simonwillison.net/2024/Nov/2/smollm2/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct"&gt;SmolLM2&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New from &lt;a href="https://loubnabnl.github.io/"&gt;Loubna Ben Allal&lt;/a&gt; and her research team at Hugging Face:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. [...]&lt;/p&gt;
&lt;p&gt;It was trained on 11 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new mathematics and coding datasets that we curated and will release soon.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The model weights are released under an Apache 2 license. I've been trying these out using my &lt;a href="https://github.com/simonw/llm-gguf"&gt;llm-gguf&lt;/a&gt; plugin for &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; and my first impressions are really positive.&lt;/p&gt;
&lt;p&gt;Here's a recipe to run a 1.7GB Q8 quantized model &lt;a href="https://huggingface.co/lmstudio-community/SmolLM2-1.7B-Instruct-GGUF"&gt;from lmstudio-community&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install llm-gguf
llm gguf download-model https://huggingface.co/lmstudio-community/SmolLM2-1.7B-Instruct-GGUF/resolve/main/SmolLM2-1.7B-Instruct-Q8_0.gguf -a smol17
llm chat -m smol17
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img alt="Animated terminal demo. My prompt is tell me about pelicans. The model responds: Sure, I'd be happy to tell you about pelicans! Pelicans are a group of aquatic birds in the order Pelecaniformes, which also includes the cormorants, darters, and frigatebirds. They are found on all continents except Antarctica, and are known for their distinctive pouch-like bill. There are several species of pelicans. The most common species is the Brown Pelican, which is found in the Americas. It's the only species that plunges into water from a significant height to catch fish and other prey, a behavior known as &amp;quot;fish-grabbing.&amp;quot;  Another common species is the American White Pelican, which can be found in both the Americas and Eurasia. It has a white plumage and a large, bright pink bill, and feeds on fish in lakes, rivers, and coastal wetlands.  Pelicans are generally medium-sized birds, but the Brown Pelican is the largest, with an average height of around 26-30 inches. Their bills can be as long as 11 inches!  Below the terminal you can see Activity Monitor showing 378% CPU usage for the Python process" src="https://static.simonwillison.net/static/2024/smol-demo.gif" /&gt;&lt;/p&gt;
&lt;p&gt;Or at the other end of the scale, here's how to run the 138MB &lt;a href="https://huggingface.co/lmstudio-community/SmolLM2-135M-Instruct-GGUF"&gt;Q8 quantized 135M model&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm gguf download-model https://huggingface.co/lmstudio-community/SmolLM2-135M-Instruct-GGUF/resolve/main/SmolLM2-135M-Instruct-Q8_0.gguf' -a smol135m
llm chat -m smol135m
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The blog entry to accompany SmolLM2 should be coming soon, but in the meantime here's the entry from July introducing the first version: &lt;a href="https://huggingface.co/blog/smollm"&gt; SmolLM - blazingly fast and remarkably powerful &lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/LoubnaBenAllal1/status/1852055582494294414"&gt;@LoubnaBenAllal1&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="hugging-face"/><category term="llm"/><category term="smollm"/><category term="llm-release"/></entry></feed>