mixtral
141B parameters
Commercial OK

Mixtral 8x22B Instruct

The bigger Mixtral. 141B total / 39B active. Strong general model, workstation-tier deployment.

License: Apache 2.0·Released Apr 17, 2024·Context: 65,536 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
7.5/10
Positioning

Mixtral 8x22B is the heavyweight Mixtral release — 141B total parameters, 39B active per token. Closer to a real flagship than 8x7B was, but the disk and memory footprint pushes it past consumer rigs into workstation territory. Largely superseded by Llama 4 Scout for the same hardware tier.

Strengths
  • Apache 2.0 license — license-clean alternative to Llama 4 in the workstation-class MoE space.
  • 39B active per token keeps tok/s competitive with dense ~40B models.
  • Strong multilingual — Mistral's European focus carries through.
Limitations
  • Workstation hardware required — 84 GB at Q4_K_M, partial-offload only on 24 GB cards.
  • Quality has been overtaken by Llama 4 Scout and DeepSeek V3 for similar memory.
  • Long context is weaker than the spec implies — recall degrades past 24K.
Real-world performance on RTX 4090
  • Q4_K_M (84 GB) — heavy offload: 7–12 tok/s, ~64 GB+ system RAM required
  • Q5_K_M (97 GB) — workstation only
  • Q8_0 (141 GB) — multi-card workstation
Should you run this locally?

Yes, for workstation rigs where Apache-license MoE matters more than absolute capability — and for legacy Mixtral fine-tunes already in use. No, for new deployments — Llama 4 Scout or DeepSeek V3 are the better picks at similar hardware investment.

How it compares
  • vs Llama 4 Scout → similar memory footprint; Scout wins on multimodality + architecture sophistication. New work tilts toward Scout.
  • vs Mixtral 8x7B → 8x22B is a legitimate flagship where 8x7B was a tech demo. If MoE is the goal, 8x22B is the only Mixtral worth running today.
  • vs DeepSeek V3 → V3 has more total params but very strong active-param efficiency; V3 wins on quality, Mixtral 8x22B wins on license clarity.
Run this yourself
ollama pull mixtral:8x22b-instruct-v0.1-q4_K_M
ollama run mixtral:8x22b-instruct-v0.1-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, --n-gpu-layers ~30, RTX 4090 + 96 GB DDR5
Why this rating

7.5/10 — the more credible MoE option in the Mixtral family, but at 141B total / 39B active it's workstation-only and now eclipsed by Llama 4 Scout (similar size, native multimodal) and DeepSeek V3 (fewer active params, better quality). Loses points for being out of the consumer-card zone.

Overview

The bigger Mixtral. 141B total / 39B active. Strong general model, workstation-tier deployment.

Strengths

  • Apache 2.0
  • Multilingual

Weaknesses

  • 96GB+ VRAM required
  • Outpaced by Qwen 3 235B-A22B

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M84.0 GB96 GB

Get the model

Ollama

One-line install

ollama run mixtral:8x22bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Mixtral 8x22B Instruct.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.

Frequently asked

What's the minimum VRAM to run Mixtral 8x22B Instruct?

96GB of VRAM is enough to run Mixtral 8x22B Instruct at the Q4_K_M quantization (file size 84.0 GB). Higher-quality quantizations need more.

Can I use Mixtral 8x22B Instruct commercially?

Yes — Mixtral 8x22B Instruct ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Mixtral 8x22B Instruct?

Mixtral 8x22B Instruct supports a context window of 65,536 tokens (about 66K).

How do I install Mixtral 8x22B Instruct with Ollama?

Run `ollama pull mixtral:8x22b` to download, then `ollama run mixtral:8x22b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.