qwen
32B parameters
Commercial OK

QwQ 32B Preview

Qwen team's reasoning-focused experimental release. Visible chain-of-thought in <think> tags. Precursor to Qwen 3's thinking mode.

License: Apache 2.0·Released Nov 27, 2024·Context: 32,768 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
8.7/10
Positioning

QwQ 32B is the reasoning specialist of the Qwen 2.5 generation. Built for math, code planning, and multi-step problem solving. It always reasons (no toggle), so every output is preceded by visible chain-of-thought — useful when you want to see the model's work, expensive when you don't.

Strengths
  • State-of-the-art reasoning at 32B — surpasses Qwen 2.5 32B Instruct on GSM8K, MATH, HumanEval.
  • Visible chain-of-thought is genuinely useful for verifying the model's logic on hard problems.
  • Same VRAM footprint as Qwen 2.5 32B / Qwen 3 32B.
Limitations
  • Always reasons — every prompt eats 2–3× the tokens vs a non-reasoning model.
  • Wastes capacity on simple prompts — using it for "write me a quick email" is a clear mismatch.
  • Generalist quality is below Qwen 2.5 32B Instruct outside the reasoning sweet spot.
Real-world performance on RTX 4090
  • Q4_K_M (19.4 GB): 68–86 tok/s decode — but 2–3× more tokens emitted per answer
  • Q5_K_M (22.9 GB): 56–70 tok/s
  • Q8_0 (35 GB): partial offload only
Should you run this locally?

Yes, for math-heavy work, coding planning, scientific problem solving, agent loops where reasoning quality matters more than throughput. No, for general chat, drafting, or any high-throughput application — pick Qwen 3 32B with thinking mode toggled per turn instead.

How it compares
  • vs Qwen 3 32B → Qwen 3 32B with thinking is more flexible; QwQ 32B has slightly stronger reasoning but always pays the latency cost. Qwen 3 32B has largely subsumed QwQ for new deployments.
  • vs DeepSeek R1 Distill Qwen 32B → R1 Distill is meaningfully stronger on the hardest reasoning tasks; QwQ holds up on most everyday math/code.
  • vs Qwen 2.5 32B Instruct → QwQ is the reasoning specialist, Instruct is the generalist. Different jobs.
Run this yourself
ollama pull qwq:32b-preview-q4_K_M
ollama run qwq:32b-preview-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4090
Why this rating

8.7/10 — Alibaba's dedicated reasoning specialist in the 32B body. State-of-the-art for math/code reasoning at this size, but pays for it with verbose chain-of-thought that doubles latency on every prompt. Loses points to general-purpose models when reasoning isn't required.

Overview

Qwen team's reasoning-focused experimental release. Visible chain-of-thought in <think> tags. Precursor to Qwen 3's thinking mode.

Strengths

  • Strong math and reasoning
  • Apache 2.0
  • Visible CoT

Weaknesses

  • Verbose output
  • Not great for chat

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M19.0 GB24 GB

Get the model

Ollama

One-line install

ollama run qwq:32bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/Qwen/QwQ-32B-Preview

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of QwQ 32B Preview.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run QwQ 32B Preview?

24GB of VRAM is enough to run QwQ 32B Preview at the Q4_K_M quantization (file size 19.0 GB). Higher-quality quantizations need more.

Can I use QwQ 32B Preview commercially?

Yes — QwQ 32B Preview ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of QwQ 32B Preview?

QwQ 32B Preview supports a context window of 32,768 tokens (about 33K).

How do I install QwQ 32B Preview with Ollama?

Run `ollama pull qwq:32b` to download, then `ollama run qwq:32b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/Qwen/QwQ-32B-Preview

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.