DeepSeek V4 Flash (284B MoE)
The cost-efficient sibling of V4-Pro. 284B total / 13B active MoE, same hybrid CSA+HCA attention, same 1M context. The MoE active-param ratio (4.5%) makes it surprisingly fast for its nameplate size — practical on dual A100 / single H200 / Mac Studio M3 Ultra 192 GB.
Overview
The cost-efficient sibling of V4-Pro. 284B total / 13B active MoE, same hybrid CSA+HCA attention, same 1M context. The MoE active-param ratio (4.5%) makes it surprisingly fast for its nameplate size — practical on dual A100 / single H200 / Mac Studio M3 Ultra 192 GB.
Strengths
- 13B active params — fast despite 284B nameplate
- 1M context window with same hybrid attention as V4-Pro
- MIT license, $0.14/$0.28 per 1M tokens via DeepSeek API
- Single Mac Studio M3 Ultra 192GB runs it via MLX
Weaknesses
- 162 GB Q4_K_M — workstation hardware required
- Quality below V4-Pro on hardest reasoning tasks
- MoE quant degradation faster below Q4 than dense models
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 162.0 GB | 192 GB |
| Q5_K_M | 198.0 GB | 224 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of DeepSeek V4 Flash (284B MoE).
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run DeepSeek V4 Flash (284B MoE)?
Can I use DeepSeek V4 Flash (284B MoE) commercially?
What's the context length of DeepSeek V4 Flash (284B MoE)?
Source: huggingface.co/deepseek-ai/DeepSeek-V4-Flash
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.