Simon Willison's Weblog: translation

Speech translation in Google Meet is now rolling out to mobile devices

2026-04-27T17:37:47+00:00

Speech translation in Google Meet is now rolling out to mobile devices

I just encountered this feature via a "try this out now" prompt in a Google Meet meeting. It kind-of worked!

This is Google's implementation of the ultimate sci-fi translation app, where two people can talk to each other in two separate languages and Meet translates from one to the other and - with a short delay - repeats the text in your preferred language, with a rough imitation of the original speaker's voice.

It can only handle English, Spanish, French, German, Portuguese, and Italian at the moment. It's also still very alpha - I ran it successfully between two laptops running web browsers, but then when I tried between an iPhone and an iPad it didn't seem to work.

Tags: google, translation

Shisa V2 405B: Japan’s Highest Performing LLM

2025-06-03T04:07:55+00:00

Shisa V2 405B: Japan’s Highest Performing LLM

Leonard Lin and Adam Lensenmayer have been working on Shisa for a while. They describe their latest release as "Japan's Highest Performing LLM".

Shisa V2 405B is the highest-performing LLM ever developed in Japan, and surpasses GPT-4 (0603) and GPT-4 Turbo (2024-04-09) in our eval battery. (It also goes toe-to-toe with GPT-4o (2024-11-20) and DeepSeek-V3 (0324) on Japanese MT-Bench!)

This 405B release is a follow-up to the six smaller Shisa v2 models they released back in April, which took a similar approach to DeepSeek-R1 in producing different models that each extended different existing base model from Llama, Qwen, Mistral and Phi-4.

The new 405B model uses Llama 3.1 405B Instruct as a base, and is available under the Llama 3.1 community license.

Shisa is a prominent example of Sovereign AI - the ability for nations to build models that reflect their own language and culture:

We strongly believe that it’s important for homegrown AI to be developed both in Japan (and globally!), and not just for the sake of cultural diversity and linguistic preservation, but also for data privacy and security, geopolitical resilience, and ultimately, independence.

We believe the open-source approach is the only realistic way to achieve sovereignty in AI, not just for Japan, or even for nation states, but for the global community at large.

The accompanying overview report has some fascinating details:

Training the 405B model was extremely difficult. Only three other groups that we know of: Nous Research, Bllossom, and AI2 have published Llama 405B full fine-tunes. [...] We implemented every optimization at our disposal including: DeepSpeed ZeRO-3 parameter and activation offloading, gradient accumulation, 8-bit paged optimizer, and sequence parallelism. Even so, the 405B model still barely fit within the H100’s memory limits

In addition to the new model the Shisa team have published shisa-ai/shisa-v2-sharegpt, 180,000 records which they describe as "a best-in-class synthetic dataset, freely available for use to improve the Japanese capabilities of any model. Licensed under Apache 2.0".

An interesting note is that they found that since Shisa out-performs GPT-4 at Japanese that model was no longer able to help with evaluation, so they had to upgrade to GPT-4.1:

Tags: leonard-lin, translation, ai, generative-ai, llama, llms, fine-tuning, evals, llm-release

A professional workflow for translation using LLMs

2025-02-02T04:23:19+00:00

A professional workflow for translation using LLMs

Tom Gally is a professional translator who has been exploring the use of LLMs since the release of GPT-4. In this Hacker News comment he shares a detailed workflow for how he uses them to assist in that process.

Tom starts with the source text and custom instructions, including context for how the translation will be used. Here's an imaginary example prompt, which starts:

The text below in Japanese is a product launch presentation for Sony's new gaming console, to be delivered by the CEO at Tokyo Game Show 2025. Please translate it into English. Your translation will be used in the official press kit and live interpretation feed. When translating this presentation, please follow these guidelines to create an accurate and engaging English version that preserves both the meaning and energy of the original: [...]

It then lists some tone, style and content guidelines custom to that text.

Tom runs that prompt through several different LLMs and starts by picking sentences and paragraphs from those that form a good basis for the translation.

As he works on the full translation he uses Claude to help brainstorm alternatives for tricky sentences:

When I am unable to think of a good English version for a particular sentence, I give the Japanese and English versions of the paragraph it is contained in to an LLM (usually, these days, Claude) and ask for ten suggestions for translations of the problematic sentence. Usually one or two of the suggestions work fine; if not, I ask for ten more. (Using an LLM as a sentence-level thesaurus on steroids is particularly wonderful.)

He uses another LLM and prompt to check his translation against the original and provide further suggestions, which he occasionally acts on. Then as a final step he runs the finished document through a text-to-speech engine to try and catch any "minor awkwardnesses" in the result.

I love this as an example of an expert using LLMs as tools to help further elevate their work. I'd love to read more examples like this one from experts in other fields.

Tags: hacker-news, translation, ai, generative-ai, llms, tom-gally

OpenAI o3-mini, now available in LLM

2025-01-31T21:50:36+00:00

OpenAI's o3-mini is out today. As with other o-series models it's a slightly difficult one to evaluate - we now need to decide if a prompt is best run using GPT-4o, o1, o3-mini or (if we have access) o1 Pro.

Confusing matters further, the benchmarks in the o3-mini system card (PDF) aren't a universal win for o3-mini across all categories. It generally benchmarks higher than GPT-4o and o1 but not across everything.

The biggest win for o3-mini is on the Codeforces ELO competitive programming benchmark, which I think is described by this 2nd January 2025 paper, with the following scores:

o3-mini (high) 2130
o3-mini (medium) 2036
o1 1891
o3-mini (low) 1831
o1-mini 1650
o1-preview 1258
GPT-4o 900

Weirdly, that GPT-4o score was in an older copy of the System Card PDF which has been replaced by an updated document that doesn't mention Codeforces ELO scores at all.

One note from the System Card that stood out for me concerning intended applications of o3-mini for OpenAI themselves:

We also plan to allow users to use o3-mini to search the internet and summarize the results in ChatGPT. We expect o3-mini to be a useful and safe model for doing this, especially given its performance on the jailbreak and instruction hierarchy evals detailed in Section 4 below.

This is notable because the existing o1 models on ChatGPT have not yet had access to their web search tool - despite the mixture of search and "reasoning" models having very clear benefits.

o3-mini does not and will not support vision. We will have to wait for future OpenAI reasoning models for that.

I released LLM 0.21 with support for the new model, plus its -o reasoning_effort high (or medium or low) option for tweaking the reasoning effort - details in this issue.

Note that the new model is currently only available for Tier 3 and higher users, which requires you to have spent at least $100 on the API.

o3-mini is priced at $1.10/million input tokens, $4.40/million output tokens - less than half the price of GPT-4o (currently $2.50/$10) and massively cheaper than o1 ($15/$60). The GPT-4o comparison isn't quite as simple as that though, as o3-mini's invisible reasoning tokens still count towards the output tokens you get charged for.

I tried using it to summarize this conversation about o3-mini on Hacker News, using my hn-summary.sh script.

hn-summary.sh 42890627 -o o3-mini

Here's the result - it used 18,936 input tokens and 2,905 output tokens for a total cost of 3.3612 cents.

o3-mini (and o1-mini) are text-only models: they don't accept image inputs. The full o1 API model can accept images in the same way as GPT-4o.

Another characteristic worth noting is o3-mini's token output limit - the measure of how much text it can output in one go. That's 100,000 tokens, compared to 16,000 for GPT-4o and just 8,000 for both DeepSeek R1 and Claude 3.5.

Invisible "reasoning tokens" come out of the same budget, so it's likely not possible to have it output the full 100,000.

The model accepts up to 200,000 tokens of input, an improvement on GPT-4o's 128,000.

An application where output limits really matter is translation between human languages, where the output can realistically be expected to have a similar length to the input. It will be interesting seeing how well o3-mini works for that, especially given its low price.

Update: Here's a fascinating comment on this by professional translator Tom Gally on Hacker News:

I just did a test in which both R1 and o3-mini got worse at translation in the latter half of a long text. [...]

An initial comparison of the output suggested that, while R1 didn’t seem bad, o3-mini produced a writing style closer to what I asked for in the prompt—smoother and more natural English. But then I noticed that the output length was 5,855 characters for R1, 9,052 characters for o3-mini, and 11,021 characters for my own polished version. Comparing the three translations side-by-side with the original Japanese, I discovered that R1 had omitted entire paragraphs toward the end of the speech, and that o3-mini had switched to a strange abbreviated style (using slashes instead of “and” between noun phrases, for example) toward the end as well. The vanilla versions of ChatGPT, Claude, and Gemini that I ran the same prompt and text through a month ago had had none of those problems.

Tags: projects, translation, ai, openai, generative-ai, llm, llm-pricing, llm-reasoning, o3, llm-release

Quoting Eric Lehman

2024-02-11T22:59:38+00:00

One consideration is that such a deep ML system could well be developed outside of Google-- at Microsoft, Baidu, Yandex, Amazon, Apple, or even a startup. My impression is that the Translate team experienced this. Deep ML reset the translation game; past advantages were sort of wiped out. Fortunately, Google's huge investment in deep ML largely paid off, and we excelled in this new game. Nevertheless, our new ML-based translator was still beaten on benchmarks by a small startup. The risk that Google could similarly be beaten in relevance by another company is highlighted by a startling conclusion from BERT: huge amounts of user feedback can be largely replaced by unsupervised learning from raw text. That could have heavy implications for Google.

— Eric Lehman, internal Google email in 2018

Tags: bert, google, machine-learning, translation, ai, generative-ai, llms

Seamless Communication

2023-12-01T17:01:37+00:00

Seamless Communication

A new “family of AI research models” from Meta AI for speech and text translation. The live demo is particularly worth trying—you can record a short webcam video of yourself speaking and get back the same video with your speech translated into another language.

The key to it is the new SeamlessM4T v2 model, which supports 101 languages for speech input, 96 Languages for text input/output and 35 languages for speech output. SeamlessM4T-Large v2 is a 9GB file, available on Hugging Face.

Also in this release: SeamlessExpressive, which “captures certain underexplored aspects of prosody such as speech rate and pauses”—effectively maintaining things like expressed enthusiasm across languages.

Plus SeamlessStreaming, “a model that can deliver speech and text translations with around two seconds of latency”.

Via facebookresearch/seamless_communication

Tags: facebook, transformers, translation, ai, llms

Introducing speech-to-text, text-to-speech, and more for 1,100+ languages

2023-05-22T19:22:38+00:00

Introducing speech-to-text, text-to-speech, and more for 1,100+ languages

New from Meta AI: Massively Multilingual Speech. “MMS supports speech-to-text and text-to-speech for 1,107 languages and language identification for over 4,000 languages. [...] Some of these, such as the Tatuyo language, have only a few hundred speakers, and for most of these languages, no prior speech technology exists.”

It’s licensed CC-BY-NC 4.0 though, so it’s not available for commercial use.

“In a like-for-like comparison with OpenAI’s Whisper, we found that models trained on the Massively Multilingual Speech data achieve half the word error rate, but Massively Multilingual Speech covers 11 times more languages.”

The training data was mostly sourced from audio Bible translations.

Via Hacker News

Tags: facebook, translation, ai, training-data

Google Translate (beta)

2007-07-03T16:43:19+00:00

Google Translate (beta)

Google’s beta translator based on statistical analysis of things like the United Nations corpus. I have no idea how long this has been available; it isn’t linked from their homepage.

Tags: google, i18n, languages, translation

Django-fr

2007-06-21T10:50:16+00:00

Django-fr

Community site for French language Django developers. They’ve already made a promising start on translating the documentation.

Tags: django, documentation, france, french, translation

Comment transformer votre blog en une OpenID ?

2006-12-21T15:26:57+00:00

Comment transformer votre blog en une OpenID ?

My piece on OpenID tranlated in to French by Christophe Ducamp.

Tags: french, openid, translation