Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code

Simon Willison

Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code
https://simonwillison.net/2024/Nov/12/qwen25-coder/

Simon Willison

So far I've run Qwen2.5-Coder-32B successfully in two different ways: once via Ollama (and the llm-ollama plugin) and once using Apple's MLX framework and mlx-llm - details on how I ran both of those are in my article.

Simon Willison

Here's a one-liner that should work for you if you run uv on a Mac with 64GB of RAM (it will download ~32GB of model the first time you run it)

uv run --with mlx-lm \
mlx_lm.generate \
--model mlx-community/Qwen2.5-Coder-32B-Instruct-8bit \
--max-tokens 4000 \
--prompt 'write me a python function that renders a mandelbrot fractal as wide as the current terminal'

Stefano Pacifico 🧬 🇺🇦

@simon besides offline use and additionally privacy, did you detect any other advantage running locally?

Drew Breunig

@simon Did you notice a speed difference between mlx and ollama?

Simon Willison

@dbreunig I haven't measured it properly but MLX feels a bit faster to me

Simon Willison

@stefpac sadly not, I'm probably going to continue mostly using the best hosted ones because then I don't have to sacrifice half my system RAM

David Edmiston

@simon Your post mentioned a ~20GB quantized file via Ollama; did that take up 20GB of RAM or 32?

I’m waiting on delivery this/early next week of a 48GB M4 Pro which is why I'm kinda curious.

Simon Willison

@edmistond I just tried running a prompt through the Ollama qwen2.5-coder:32b model and to my surprise it appeared to peak at just 2GB of RAM usage, but it was using 95% of my GPU

I thought GPU and system RAM were shared on macOS so I don't entirely understand what happened there, I would have expected more like 20GB of RAM use

Simon Willison

Added an example showing Qwen 2.5 Coder's performance on my "pelican on a bicycle" benchmark:

llm -m qwen2.5-coder:32b 'Generate an SVG of a pelican riding a bicycle'

It's not the *worst* I've seen! https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/