DeepSeek R1 Distill Llama 8B
R1 reasoning distilled into a Llama 3 8B base. Smaller R1 distill; useful when 32B is too heavy. Reasoning quality is meaningfully below the 32B distill but still beats non-reasoning Llama 8B on math/code.
Overview
R1 reasoning distilled into a Llama 3 8B base. Smaller R1 distill; useful when 32B is too heavy. Reasoning quality is meaningfully below the 32B distill but still beats non-reasoning Llama 8B on math/code.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Reasoning model on 8B-class hardware
- Apache 2.0
- Llama 3 base — broad runtime support
Weaknesses
- Reasoning depth limited by base size
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 4.7 GB | 6 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of DeepSeek R1 Distill Llama 8B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run DeepSeek R1 Distill Llama 8B?
Can I use DeepSeek R1 Distill Llama 8B commercially?
What's the context length of DeepSeek R1 Distill Llama 8B?
Source: huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.