QwQ 32B Preview

Positioning

QwQ 32B is the reasoning specialist of the Qwen 2.5 generation. Built for math, code planning, and multi-step problem solving. It always reasons (no toggle), so every output is preceded by visible chain-of-thought — useful when you want to see the model's work, expensive when you don't.

Strengths

State-of-the-art reasoning at 32B — surpasses Qwen 2.5 32B Instruct on GSM8K, MATH, HumanEval.
Visible chain-of-thought is genuinely useful for verifying the model's logic on hard problems.
Same VRAM footprint as Qwen 2.5 32B / Qwen 3 32B.

Limitations

Always reasons — every prompt eats 2–3× the tokens vs a non-reasoning model.
Wastes capacity on simple prompts — using it for "write me a quick email" is a clear mismatch.
Generalist quality is below Qwen 2.5 32B Instruct outside the reasoning sweet spot.

Real-world performance on RTX 4090

Q4_K_M (19.4 GB): 68–86 tok/s decode — but 2–3× more tokens emitted per answer
Q5_K_M (22.9 GB): 56–70 tok/s
Q8_0 (35 GB): partial offload only

Should you run this locally?

Yes, for math-heavy work, coding planning, scientific problem solving, agent loops where reasoning quality matters more than throughput. No, for general chat, drafting, or any high-throughput application — pick Qwen 3 32B with thinking mode toggled per turn instead.

How it compares

vs Qwen 3 32B → Qwen 3 32B with thinking is more flexible; QwQ 32B has slightly stronger reasoning but always pays the latency cost. Qwen 3 32B has largely subsumed QwQ for new deployments.
vs DeepSeek R1 Distill Qwen 32B → R1 Distill is meaningfully stronger on the hardest reasoning tasks; QwQ holds up on most everyday math/code.
vs Qwen 2.5 32B Instruct → QwQ is the reasoning specialist, Instruct is the generalist. Different jobs.

Run this yourself

ollama pull qwq:32b-preview-q4_K_M
ollama run qwq:32b-preview-q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4090

Quantization	File size	VRAM required
Q4_K_M	19.0 GB	24 GB

Quantization

File size

VRAM required

Q4_K_M

19.0 GB

24 GB

Frequently asked

What's the minimum VRAM to run QwQ 32B Preview?

24GB of VRAM is enough to run QwQ 32B Preview at the Q4_K_M quantization (file size 19.0 GB). Higher-quality quantizations need more.

Can I use QwQ 32B Preview commercially?

Yes — QwQ 32B Preview ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of QwQ 32B Preview?

QwQ 32B Preview supports a context window of 32,768 tokens (about 33K).

How do I install QwQ 32B Preview with Ollama?

Run `ollama pull qwq:32b` to download, then `ollama run qwq:32b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run QwQ 32B Preview?

Can I use QwQ 32B Preview commercially?

What's the context length of QwQ 32B Preview?

How do I install QwQ 32B Preview with Ollama?