DeepSeek R1 Distill Qwen 32B

Positioning

The "best reasoning model that runs full-GPU on a single 24 GB card." If you can't accept the 70B distill's offload speeds (22–28 tok/s) and want pure-VRAM throughput (70+ tok/s) with serious reasoning training, this is the pick.

Strengths

19 GB at Q4_K_M — full GPU on 24 GB, no offload, 70+ tok/s.
R1-class reasoning training — closes most of the gap vs base Qwen 2.5 32B on math/code.
Qwen license — same MAU caps as base Qwen 2.5 32B.

Limitations

Below the 70B Distill on absolute reasoning ceiling.
Verbose chain-of-thought — same token-cost concern as other reasoning models.
Generalist quality slightly lags base Qwen 2.5 32B for simple chat.

Real-world performance on RTX 4090

Q4_K_M (19.4 GB): 68–86 tok/s decode (with chain-of-thought verbosity)
Q5_K_M (22.9 GB): 56–70 tok/s
Q8_0 (35 GB): partial offload, 18–24 tok/s

Should you run this locally?

Yes, for 24 GB GPU owners who want strong reasoning at full-GPU speed. Best speed-quality tradeoff in the reasoning space. No, for users who can accept 70B's offload speed — pick R1 Distill Llama 70B for higher reasoning ceiling.

How it compares

vs DeepSeek R1 Distill Llama 70B → 70B is smarter; this 32B is much faster (full GPU). Pick by speed-vs-quality.
vs QwQ 32B → similar size, R1 Distill wins on hardest reasoning; QwQ has slightly cleaner everyday traces. R1 Distill is the stronger pick for math/code planning.
vs Qwen 3 32B with thinking mode → Qwen 3 32B is more flexible (thinking toggle); R1 Distill has more aggressive reasoning training. Coin flip.

Run this yourself

ollama pull deepseek-r1:32b-distill-qwen-q4_K_M
ollama run deepseek-r1:32b-distill-qwen-q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4090

Quantization	File size	VRAM required
Q4_K_M	19.0 GB	24 GB
Q8_0	34.0 GB	40 GB

Quantization

File size

VRAM required

Q4_K_M

19.0 GB

24 GB

Q8_0

34.0 GB

40 GB

Frequently asked

What's the minimum VRAM to run DeepSeek R1 Distill Qwen 32B?

24GB of VRAM is enough to run DeepSeek R1 Distill Qwen 32B at the Q4_K_M quantization (file size 19.0 GB). Higher-quality quantizations need more.

Can I use DeepSeek R1 Distill Qwen 32B commercially?

Yes — DeepSeek R1 Distill Qwen 32B ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of DeepSeek R1 Distill Qwen 32B?

DeepSeek R1 Distill Qwen 32B supports a context window of 131,072 tokens (about 131K).

How do I install DeepSeek R1 Distill Qwen 32B with Ollama?

Run `ollama pull deepseek-r1:32b` to download, then `ollama run deepseek-r1:32b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run DeepSeek R1 Distill Qwen 32B?

Can I use DeepSeek R1 Distill Qwen 32B commercially?

What's the context length of DeepSeek R1 Distill Qwen 32B?

How do I install DeepSeek R1 Distill Qwen 32B with Ollama?