DeepSeek V4 Flash (284B MoE)

The cost-efficient sibling of V4-Pro. 284B total / 13B active MoE, same hybrid CSA+HCA attention, same 1M context. The MoE active-param ratio (4.5%) makes it surprisingly fast for its nameplate size — practical on dual A100 / single H200 / Mac Studio M3 Ultra 192 GB.

License: MIT·Released Apr 24, 2026·Context: 1,048,576 tokens

Overview

Strengths

13B active params — fast despite 284B nameplate
1M context window with same hybrid attention as V4-Pro
MIT license, $0.14/$0.28 per 1M tokens via DeepSeek API
Single Mac Studio M3 Ultra 192GB runs it via MLX

Weaknesses

162 GB Q4_K_M — workstation hardware required
Quality below V4-Pro on hardest reasoning tasks
MoE quant degradation faster below Q4 than dense models

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	162.0 GB	192 GB
Q5_K_M	198.0 GB	224 GB

Get the model

HuggingFace

Original weights

huggingface.co/deepseek-ai/DeepSeek-V4-Flash

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of DeepSeek V4 Flash (284B MoE).

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

No verdicted models in the next tier up yet.

Step down

Smaller — faster, runs on weaker hardware

Frequently asked

What's the minimum VRAM to run DeepSeek V4 Flash (284B MoE)?

192GB of VRAM is enough to run DeepSeek V4 Flash (284B MoE) at the Q4_K_M quantization (file size 162.0 GB). Higher-quality quantizations need more.

Can I use DeepSeek V4 Flash (284B MoE) commercially?

Yes — DeepSeek V4 Flash (284B MoE) ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of DeepSeek V4 Flash (284B MoE)?

DeepSeek V4 Flash (284B MoE) supports a context window of 1,048,576 tokens (about 1049K).

Source: huggingface.co/deepseek-ai/DeepSeek-V4-Flash

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.