Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code
-
Wrote up some notes on the new Qwen2.5-Coder-32B model, which is the first model I've run on my own Mac (64GB M2) that appears to be highly competent at writing code
https://simonwillison.net/2024/Nov/12/qwen25-coder/ -
So far I've run Qwen2.5-Coder-32B successfully in two different ways: once via Ollama (and the llm-ollama plugin) and once using Apple's MLX framework and mlx-llm - details on how I ran both of those are in my article.
-
Here's a one-liner that should work for you if you run uv on a Mac with 64GB of RAM (it will download ~32GB of model the first time you run it)
uv run --with mlx-lm \
mlx_lm.generate \
--model mlx-community/Qwen2.5-Coder-32B-Instruct-8bit \
--max-tokens 4000 \
--prompt 'write me a python function that renders a mandelbrot fractal as wide as the current terminal' -
Stefano Pacifico 𧬠πΊπ¦replied to Simon Willison last edited by
@simon besides offline use and additionally privacy, did you detect any other advantage running locally?
-
@simon Did you notice a speed difference between mlx and ollama?
-
@dbreunig I haven't measured it properly but MLX feels a bit faster to me
-
Simon Willisonreplied to Stefano Pacifico 𧬠πΊπ¦ last edited by
@stefpac sadly not, I'm probably going to continue mostly using the best hosted ones because then I don't have to sacrifice half my system RAM
-
@simon Your post mentioned a ~20GB quantized file via Ollama; did that take up 20GB of RAM or 32?
Iβm waiting on delivery this/early next week of a 48GB M4 Pro which is why I'm kinda curious.
-
@edmistond I just tried running a prompt through the Ollama qwen2.5-coder:32b model and to my surprise it appeared to peak at just 2GB of RAM usage, but it was using 95% of my GPU
I thought GPU and system RAM were shared on macOS so I don't entirely understand what happened there, I would have expected more like 20GB of RAM use
-
Added an example showing Qwen 2.5 Coder's performance on my "pelican on a bicycle" benchmark:
llm -m qwen2.5-coder:32b 'Generate an SVG of a pelican riding a bicycle'
It's not the *worst* I've seen! https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/