qwen
32B parameters
Commercial OK

Qwen 2.5 Coder 32B Instruct

Coding-specialist Qwen 2.5. Beats GPT-4o on HumanEval and matches Sonnet on many code-edit benchmarks. The default local-coding model on 24GB cards.

License: Apache 2.0·Released Nov 12, 2024·Context: 131,072 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
9.2/10
Positioning

The model to run if you want a Cursor / Copilot replacement on your own hardware. Qwen 2.5 Coder 32B is the headline open-weight coding model — strong fill-in-the-middle, strong repo-scale reasoning, fast enough on a 4090 to keep up with interactive editing.

Strengths
  • Fill-in-the-middle is genuinely good — the actual mechanism Cursor and Copilot rely on, not just chat-style code completion.
  • Repo-aware reasoning — handles 32K-context code review tasks credibly; instruction-tuned to navigate multi-file context.
  • 70–88 tok/s on 4090 Q4 — fast enough for interactive code-as-you-type once integrated with a properly streaming editor plugin.
Limitations
  • Qwen license MAU cap is a real concern for SaaS deployments.
  • Lags closed models on novel-architecture tasks — anything genuinely outside its training distribution still falls back to plausible-but-wrong patterns.
  • Repo-context isn't free — feeding a real codebase still requires good RAG or AST-aware chunking; the model alone won't fix bad context selection.
Real-world performance on RTX 4090
  • Q4_K_M (19 GB): 70–88 tok/s decode, TTFT ~140 ms
  • Q5_K_M (22.6 GB): 58–72 tok/s
  • Q8_0 (35 GB): partial offload, 18–25 tok/s
Should you run this locally?

Yes, for any developer with an RTX 3090 / 4090 / 5080+ who wants Copilot-class autocomplete without the cloud round-trip. The headline win for local AI. No, for developers comfortable with closed services for $10–20/month — for novel languages or rare frameworks, GPT-4 / Claude still produce more reliable code.

How it compares
  • vs DeepSeek Coder V2 Lite → Qwen 2.5 Coder 32B is meaningfully stronger; DeepSeek Coder V2 Lite (16B) is the right pick under 16 GB VRAM.
  • vs Codestral 22B → Qwen 2.5 Coder 32B wins on capability; Codestral has cleaner Mistral license terms.
  • vs Qwen 2.5 32B Instruct → Coder is dramatically better at coding; pick Instruct for general chat.
  • vs DeepSeek V3 / R1 → V3 and R1 are stronger at hard reasoning but uncommonly large for single-card use.
Run this yourself
ollama pull qwen2.5-coder:32b-instruct-q4_K_M
ollama run qwen2.5-coder:32b-instruct-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, full GPU on 4090 Editor integration: Continue.dev or Tabby with Ollama backend
Why this rating

9.2/10 — the strongest open-weight coding model that runs on a single 24 GB GPU. Genuinely competitive with closed coding models (GPT-4, Claude) on most non-frontier tasks. The only reason it loses points is the Qwen license MAU cap.

Overview

Coding-specialist Qwen 2.5. Beats GPT-4o on HumanEval and matches Sonnet on many code-edit benchmarks. The default local-coding model on 24GB cards.

Strengths

  • Best open-weight coder at release
  • Apache 2.0
  • Strong fill-in-middle

Weaknesses

  • Less strong on general chat than non-coder Qwen

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M19.0 GB24 GB
Q8_034.0 GB40 GB

Get the model

Ollama

One-line install

ollama run qwen2.5-coder:32bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Qwen 2.5 Coder 32B Instruct.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Qwen 2.5 Coder 32B Instruct?

24GB of VRAM is enough to run Qwen 2.5 Coder 32B Instruct at the Q4_K_M quantization (file size 19.0 GB). Higher-quality quantizations need more.

Can I use Qwen 2.5 Coder 32B Instruct commercially?

Yes — Qwen 2.5 Coder 32B Instruct ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Qwen 2.5 Coder 32B Instruct?

Qwen 2.5 Coder 32B Instruct supports a context window of 131,072 tokens (about 131K).

How do I install Qwen 2.5 Coder 32B Instruct with Ollama?

Run `ollama pull qwen2.5-coder:32b` to download, then `ollama run qwen2.5-coder:32b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.