deepseek
671B parameters
Commercial OK

DeepSeek V3 (671B MoE)

DeepSeek's flagship MoE — 671B total / 37B active. Server-tier, but the smaller R1 distills make this lineage approachable.

License: DeepSeek License·Released Dec 26, 2024·Context: 65,536 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
9.0/10
Positioning

DeepSeek V3 is the open-weight frontier in 2024–2025 — 671B total parameters, 37B active per token through a clever MoE design. It's the model closed-AI competitors should be most worried about. The catch: local-running it requires either workstation hardware or cloud GPU rentals.

Strengths
  • Frontier-class quality — genuinely competes with GPT-4o on many benchmarks.
  • 37B active per token keeps compute reasonable despite the 671B nameplate.
  • MIT-style permissive license — cleanest license at this capability tier.
Limitations
  • 671B total means disk + memory footprint is workstation-scale (~380 GB at Q4).
  • Routing isn't free — at low quants, MoE quality degrades faster than dense models.
  • Tool use less polished than Llama family at the time of writing.
Real-world performance on RTX 4090
  • Q4_K_M (~380 GB) — not realistically runnable on a single 4090
  • Practical local hardware: dual A100 80 GB (160 GB), Mac Studio M3 Ultra 192 GB, or H100 cluster
  • Single 4090 + 192 GB DDR5 with Q3: ~1–3 tok/s, not productive
Should you run this locally?

Yes, for workstation owners (A100/H100 multi-card, M3 Ultra), or via cloud GPU rental for short sessions. No, for consumer-card users — even with massive system RAM, the bandwidth ceiling makes this impractical on a 4090.

How it compares
  • vs Llama 4 Maverick → similar tier; V3 has the edge on math/code, Maverick wins on multimodality.
  • vs DeepSeek R1 → R1 is reasoning-trained; V3 is the better generalist. Pick by workload.
  • vs Qwen 3 235B-A22B → Qwen is the closer-sized peer; V3 wins on raw quality, Qwen wins on accessibility (smaller total params).
  • vs Mixtral 8x22B → V3 dramatically outclasses Mixtral 8x22B on quality at similar VRAM.
Run this yourself
# Workstation example (4× A100 80 GB)
ollama pull deepseek-v3:q4_K_M
ollama run deepseek-v3:q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, multi-GPU offload, A100 cluster or M3 Ultra
Why this rating

9.0/10 — the open-weight model that genuinely competes with closed frontier models on many benchmarks. The 671B / 37B-active MoE design is brilliant, but the practical reality is that local-running it requires workstation hardware. Loses fractional points only on accessibility.

Overview

DeepSeek's flagship MoE — 671B total / 37B active. Server-tier, but the smaller R1 distills make this lineage approachable.

Strengths

  • GPT-4-class quality
  • MoE efficiency
  • Open weights

Weaknesses

  • Server-only on consumer hardware
  • Permissive license but with terms

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M380.0 GB420 GB

Get the model

Ollama

One-line install

ollama run deepseek-v3:671bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/deepseek-ai/DeepSeek-V3

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of DeepSeek V3 (671B MoE).

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.

Frequently asked

What's the minimum VRAM to run DeepSeek V3 (671B MoE)?

420GB of VRAM is enough to run DeepSeek V3 (671B MoE) at the Q4_K_M quantization (file size 380.0 GB). Higher-quality quantizations need more.

Can I use DeepSeek V3 (671B MoE) commercially?

Yes — DeepSeek V3 (671B MoE) ships under the DeepSeek License, which permits commercial use. Always read the license text before deployment.

What's the context length of DeepSeek V3 (671B MoE)?

DeepSeek V3 (671B MoE) supports a context window of 65,536 tokens (about 66K).

How do I install DeepSeek V3 (671B MoE) with Ollama?

Run `ollama pull deepseek-v3:671b` to download, then `ollama run deepseek-v3:671b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/deepseek-ai/DeepSeek-V3

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.