23rd November 2024 - Link Blog
Quantization matters (via) What impact does quantization have on the performance of an LLM? been wondering about this for quite a while, now here are numbers from Paul Gauthier.
He ran differently quantized versions of Qwen 2.5 32B Instruct through his Aider code editing benchmark and saw a range of scores.
The original released weights (BF16) scored highest at 71.4%, with Ollama's qwen2.5-coder:32b-instruct-fp16 (a 66GB download) achieving the same score.
The quantized Ollama qwen2.5-coder:32b-instruct-q4_K_M (a 20GB download) saw a massive drop in quality, scoring just 53.4% on the same benchmark.
Recent articles
- Initial impressions of Claude Fable 5 - 9th June 2026
- Running Python code in a sandbox with MicroPython and WASM - 6th June 2026
- Claude Opus 4.8: "a modest but tangible improvement" - 28th May 2026