<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: cohere</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/cohere.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-07-31T21:54:47+00:00</updated><author><name>Simon Willison</name></author><entry><title>More model releases on 31st July</title><link href="https://simonwillison.net/2025/Jul/31/more-models/#atom-tag" rel="alternate"/><published>2025-07-31T21:54:47+00:00</published><updated>2025-07-31T21:54:47+00:00</updated><id>https://simonwillison.net/2025/Jul/31/more-models/#atom-tag</id><summary type="html">
    &lt;p&gt;Here are a few more model releases from today, to round out a &lt;a href="https://simonwillison.net/search/?tag=llm-release&amp;amp;year=2025&amp;amp;month=7"&gt;very busy July&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cohere &lt;a href="https://cohere.com/blog/command-a-vision"&gt;released Command A Vision&lt;/a&gt;, their first multi-modal (image input) LLM. Like their others it's open weights under Creative Commons Attribution Non-Commercial, so you need to license it (or use their paid API) if you want to use it commercially.&lt;/li&gt;
&lt;li&gt;San Francisco AI startup Deep Cogito released &lt;a href="https://www.deepcogito.com/research/cogito-v2-preview"&gt;four open weights hybrid reasoning models&lt;/a&gt;, cogito-v2-preview-deepseek-671B-MoE, cogito-v2-preview-llama-405B, cogito-v2-preview-llama-109B-MoE and cogito-v2-preview-llama-70B. These follow their &lt;a href="https://www.deepcogito.com/research/cogito-v1-preview"&gt;v1 preview models&lt;/a&gt; in April at smaller 3B, 8B, 14B, 32B and 70B sizes. It looks like their unique contribution here is "distilling inference-time reasoning back into the model’s parameters" - demonstrating a form of self-improvement. I haven't tried any of their models myself yet.&lt;/li&gt;
&lt;li&gt;Mistral released &lt;a href="https://mistral.ai/news/codestral-25-08"&gt;Codestral 25.08&lt;/a&gt;, an update to their Codestral model which is specialized for fill-in‑the‑middle autocomplete as seen in text editors like VS Code, Zed and Cursor.&lt;/li&gt;
&lt;li&gt;And an anonymous stealth preview model called Horizon Alpha running &lt;a href="https://openrouter.ai/openrouter/horizon-alpha/activity"&gt;on OpenRouter&lt;/a&gt; was released yesterday and is attracting a lot of attention.&lt;/li&gt;
&lt;/ul&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mistral"&gt;mistral&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cohere"&gt;cohere&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openrouter"&gt;openrouter&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="mistral"/><category term="cohere"/><category term="llm-release"/><category term="openrouter"/></entry><entry><title>Introducing Command A: Max performance, minimal compute</title><link href="https://simonwillison.net/2025/Mar/13/command-a/#atom-tag" rel="alternate"/><published>2025-03-13T20:37:32+00:00</published><updated>2025-03-13T20:37:32+00:00</updated><id>https://simonwillison.net/2025/Mar/13/command-a/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://cohere.com/blog/command-a"&gt;Introducing Command A: Max performance, minimal compute&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New LLM release from Cohere. It's interesting to see which aspects of the model they're highlighting, as an indicator of what their commercial customers value the most (highlights mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Command A delivers maximum performance with minimal hardware costs when compared to leading proprietary and open-weights models, such as GPT-4o and DeepSeek-V3. For private deployments, &lt;strong&gt;Command A excels on business-critical agentic and multilingual tasks, while being deployable on just two GPUs&lt;/strong&gt;, compared to other models that typically require as many as 32. [...]&lt;/p&gt;
&lt;p&gt;With a serving footprint of just two A100s or H100s, it requires far less compute than other comparable models on the market. This is especially important for private deployments. [...]&lt;/p&gt;
&lt;p&gt;Its &lt;strong&gt;256k context length&lt;/strong&gt; (2x most leading models) can handle much longer enterprise documents. Other key features include Cohere’s advanced retrieval-augmented generation (RAG) with &lt;strong&gt;verifiable citations&lt;/strong&gt;, agentic tool use, enterprise-grade security, and strong multilingual performance.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's open weights but very much not open source - the license is &lt;a href="https://cohere.com/c4ai-cc-by-nc-license"&gt;Creative Commons Attribution Non-Commercial&lt;/a&gt; and also requires adhering to their &lt;a href="https://docs.cohere.com/docs/c4ai-acceptable-use-policy"&gt;Acceptable Use Policy&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Cohere offer it for commercial use via "contact us" pricing or through their API. I released &lt;a href="https://github.com/simonw/llm-command-r/releases/tag/0.3"&gt;llm-command-r 0.3&lt;/a&gt; adding support for this new model, plus their smaller and faster &lt;a href="https://cohere.com/blog/command-r7b"&gt;Command R7B&lt;/a&gt; (released in December) and support for structured outputs via &lt;a href="https://llm.datasette.io/en/stable/schemas.html"&gt;LLM schemas&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;(I found &lt;a href="https://github.com/simonw/llm-command-r/issues/8#issuecomment-2722598353"&gt;a weird bug&lt;/a&gt; with their schema support where schemas that end in an integer output a seemingly limitless integer - in my experiments it affected Command R and the new Command A but not Command R7B.)

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/Prince_Canuma/status/1900188521924620726"&gt;@Prince_Canuma&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cohere"&gt;cohere&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/structured-extraction"&gt;structured-extraction&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="cohere"/><category term="structured-extraction"/><category term="llm-release"/></entry><entry><title>Command R+ now ranked 6th on the LMSYS Chatbot Arena</title><link href="https://simonwillison.net/2024/Apr/9/command-r/#atom-tag" rel="alternate"/><published>2024-04-09T16:19:09+00:00</published><updated>2024-04-09T16:19:09+00:00</updated><id>https://simonwillison.net/2024/Apr/9/command-r/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://fedi.simonwillison.net/@simon/112242034813525962"&gt;Command R+ now ranked 6th on the LMSYS Chatbot Arena&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The LMSYS Chatbot Arena Leaderboard is one of the most interesting approaches to evaluating LLMs because it captures their ever-elusive “vibes”—it works by users voting on the best responses to prompts from two initially hidden models&lt;/p&gt;

&lt;p&gt;Big news today is that Command R+—the brand new open weights model (Creative Commons non-commercial) by Cohere—is now the highest ranked non-proprietary model, in at position six and beating one of the GPT-4s.&lt;/p&gt;

&lt;p&gt;(Linking to my screenshot on Mastodon.)


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cohere"&gt;cohere&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/command-r"&gt;command-r&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatbot-arena"&gt;chatbot-arena&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="cohere"/><category term="command-r"/><category term="chatbot-arena"/></entry><entry><title>llm-command-r</title><link href="https://simonwillison.net/2024/Apr/4/llm-command-r/#atom-tag" rel="alternate"/><published>2024-04-04T17:38:42+00:00</published><updated>2024-04-04T17:38:42+00:00</updated><id>https://simonwillison.net/2024/Apr/4/llm-command-r/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/llm-command-r"&gt;llm-command-r&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Cohere released Command R Plus today—an open weights (non commercial/research only) 104 billion parameter LLM, a big step up from their previous 35 billion Command R model.&lt;/p&gt;

&lt;p&gt;Both models are fine-tuned for both tool use and RAG. The commercial API has features to expose this functionality, including a web-search connector which lets the model run web searches as part of answering the prompt and return documents and citations as part of the JSON response.&lt;/p&gt;

&lt;p&gt;I released a new plugin for my LLM command line tool this morning adding support for the Command R models.&lt;/p&gt;

&lt;p&gt;In addition to the two models it also adds a custom command for running prompts with web search enabled and listing the referenced documents.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cohere"&gt;cohere&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/command-r"&gt;command-r&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rag"&gt;rag&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;



</summary><category term="plugins"/><category term="projects"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="cohere"/><category term="command-r"/><category term="rag"/><category term="llm-tool-use"/><category term="llm-release"/></entry><entry><title>Cohere int8 &amp; binary Embeddings - Scale Your Vector Database to Large Datasets</title><link href="https://simonwillison.net/2024/Mar/26/cohere-int8-binary-embeddings/#atom-tag" rel="alternate"/><published>2024-03-26T06:19:30+00:00</published><updated>2024-03-26T06:19:30+00:00</updated><id>https://simonwillison.net/2024/Mar/26/cohere-int8-binary-embeddings/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://txt.cohere.com/int8-binary-embeddings/"&gt;Cohere int8 &amp;amp; binary Embeddings - Scale Your Vector Database to Large Datasets&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jo Kristian Bergum told me “The accuracy retention [of binary embedding vectors] is sensitive to whether the model has been using this binarization as part of the loss function.”&lt;/p&gt;

&lt;p&gt;Cohere provide an API for embeddings, and last week added support for returning binary vectors specifically tuned in this way.&lt;/p&gt;

&lt;p&gt;250M embeddings (Cohere provide a downloadable dataset of 250M embedded documents from Wikipedia) at float32 (4 bytes) is 954GB.&lt;/p&gt;

&lt;p&gt;Cohere claim that reducing to 1 bit per dimension knocks that down to 30 GB (954/32) while keeping “90-98% of the original search quality”.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/jobergum/status/1772507515076415803"&gt;@jobergum&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/embeddings"&gt;embeddings&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cohere"&gt;cohere&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jo-kristian-bergum"&gt;jo-kristian-bergum&lt;/a&gt;&lt;/p&gt;



</summary><category term="embeddings"/><category term="cohere"/><category term="jo-kristian-bergum"/></entry><entry><title>Aya</title><link href="https://simonwillison.net/2024/Feb/13/aya/#atom-tag" rel="alternate"/><published>2024-02-13T17:14:35+00:00</published><updated>2024-02-13T17:14:35+00:00</updated><id>https://simonwillison.net/2024/Feb/13/aya/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://cohere.com/research/aya"&gt;Aya&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
“A global initiative led by Cohere For AI involving over 3,000 independent researchers across 119 countries. Aya is a state-of-art model and dataset, pushing the boundaries of multilingual AI for 101 languages through open science.”&lt;/p&gt;

&lt;p&gt;Both the model and the training data are released under Apache 2. The training data looks particularly interesting: “513 million instances through templating and translating existing datasets across 114 languages”—suggesting the data is mostly automatically generated.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=39357033"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cohere"&gt;cohere&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="cohere"/><category term="training-data"/><category term="llm-release"/></entry></feed>