deepseek
32B parameters
Commercial OK

DeepSeek R1 Distill Qwen 32B

32B distill — fits on a single 24GB card with reasoning capability. Best price-per-thinking-token combo for prosumers.

License: MIT·Released Jan 20, 2025·Context: 131,072 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
8.8/10
Positioning

The "best reasoning model that runs full-GPU on a single 24 GB card." If you can't accept the 70B distill's offload speeds (22–28 tok/s) and want pure-VRAM throughput (70+ tok/s) with serious reasoning training, this is the pick.

Strengths
  • 19 GB at Q4_K_M — full GPU on 24 GB, no offload, 70+ tok/s.
  • R1-class reasoning training — closes most of the gap vs base Qwen 2.5 32B on math/code.
  • Qwen license — same MAU caps as base Qwen 2.5 32B.
Limitations
  • Below the 70B Distill on absolute reasoning ceiling.
  • Verbose chain-of-thought — same token-cost concern as other reasoning models.
  • Generalist quality slightly lags base Qwen 2.5 32B for simple chat.
Real-world performance on RTX 4090
  • Q4_K_M (19.4 GB): 68–86 tok/s decode (with chain-of-thought verbosity)
  • Q5_K_M (22.9 GB): 56–70 tok/s
  • Q8_0 (35 GB): partial offload, 18–24 tok/s
Should you run this locally?

Yes, for 24 GB GPU owners who want strong reasoning at full-GPU speed. Best speed-quality tradeoff in the reasoning space. No, for users who can accept 70B's offload speed — pick R1 Distill Llama 70B for higher reasoning ceiling.

How it compares
  • vs DeepSeek R1 Distill Llama 70B → 70B is smarter; this 32B is much faster (full GPU). Pick by speed-vs-quality.
  • vs QwQ 32B → similar size, R1 Distill wins on hardest reasoning; QwQ has slightly cleaner everyday traces. R1 Distill is the stronger pick for math/code planning.
  • vs Qwen 3 32B with thinking mode → Qwen 3 32B is more flexible (thinking toggle); R1 Distill has more aggressive reasoning training. Coin flip.
Run this yourself
ollama pull deepseek-r1:32b-distill-qwen-q4_K_M
ollama run deepseek-r1:32b-distill-qwen-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4090
Why this rating

8.8/10 — the right reasoning model for users who want full-GPU offload on a 24 GB card. Distills R1 reasoning into the Qwen 2.5 32B body — fits in 19 GB at Q4, no system-RAM partial-offload required. Loses fractional points to the 70B distill on absolute reasoning quality.

Overview

32B distill — fits on a single 24GB card with reasoning capability. Best price-per-thinking-token combo for prosumers.

Strengths

  • MIT
  • Single-24GB-card reasoner

Weaknesses

  • Verbose CoT inflates output cost

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M19.0 GB24 GB
Q8_034.0 GB40 GB

Get the model

Ollama

One-line install

ollama run deepseek-r1:32bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of DeepSeek R1 Distill Qwen 32B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run DeepSeek R1 Distill Qwen 32B?

24GB of VRAM is enough to run DeepSeek R1 Distill Qwen 32B at the Q4_K_M quantization (file size 19.0 GB). Higher-quality quantizations need more.

Can I use DeepSeek R1 Distill Qwen 32B commercially?

Yes — DeepSeek R1 Distill Qwen 32B ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of DeepSeek R1 Distill Qwen 32B?

DeepSeek R1 Distill Qwen 32B supports a context window of 131,072 tokens (about 131K).

How do I install DeepSeek R1 Distill Qwen 32B with Ollama?

Run `ollama pull deepseek-r1:32b` to download, then `ollama run deepseek-r1:32b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.