RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
  1. >
  2. Home
  3. /Models
  4. /DBRX Instruct
dbrx
132B parameters
Commercial OK
·Reviewed May 2026

DBRX Instruct

Databricks' MoE. 132B total / 36B active. Designed for Mosaic ML pipelines; strong tool-calling discipline. Multi-GPU only.

License: Databricks Open Model License·Released Mar 27, 2024·Context: 32,768 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 8, 2026
unrated

Positioning

Databricks DBRX Instruct is the instruction-tuned variant of DBRX Base — a 132 billion parameter Mixture-of-Experts model with 36B active parameters per token. Released March 2024 by Databricks under a permissive open-weight license (DBRX Open Model License — broadly commercial-friendly with size-cap on competing services). The model was Databricks' demonstration that fine-grained MoE (16 experts, 4 active per token) could deliver strong dense-equivalent capability at lower active inference cost. By 2026, DBRX has been surpassed by DeepSeek V3 / Qwen 3 235B on most benchmarks but remains relevant for Databricks customers and as a reference point.

Strengths

  • Permissive license for most commercial uses. DBRX Open Model License allows commercial deployment except for competing AI service offerings.
  • MoE active-parameter efficiency. 36B active vs 132B total — inference cost is closer to a 36B dense model.
  • Strong code generation. DBRX was specifically trained on high-quality code data and outperformed Llama 2 / earlier Mistral on code benchmarks.
  • Databricks ecosystem integration. Deeply tied into Databricks Mosaic / Unity Catalog — first-class MLflow + databricks-sdk support.
  • Tool-use capability for agentic workflows.

Limitations

  • Surpassed by 2026 frontier MoE models. DeepSeek V3 / Qwen 3 235B both deliver better quality at similar serving costs.
  • Compute requirements are still substantial. 132B FP16 needs ~270 GB; Q4 needs ~70 GB. Frontier hardware required.
  • MoE serving complexity. Production-grade MoE inference requires vLLM / SGLang / TensorRT-LLM with MoE routing.
  • English-focused. Multilingual coverage is weak compared to Aya / Qwen / Command R.
  • Long-context degrades quickly. 32K context with notable quality drop past 16K.
  • Ecosystem maturity outside Databricks platform is limited. Self-hosting DBRX outside Databricks requires more configuration than Llama / Qwen.

Real-world performance

  • vs Llama 3.1 70B: Llama 3.1 70B wins on most benchmarks despite smaller params — reflects 18-month training-data + RLHF improvement gap. DBRX wins on raw active-param inference cost.
  • vs DeepSeek V3 (671B MoE): V3 dramatically more capable. Pick V3 for new builds.
  • vs Qwen 3 235B-A22B: Qwen 3 stronger on most benchmarks at similar serving cost.
  • vs DBRX Base: Instruct is the chat-tuned variant. Pick Instruct for chat/agentic; Base for fine-tuning starting point.

Should you run this locally?

Yes if you're a Databricks customer with Mosaic / Unity Catalog deployment, you specifically need DBRX's permissive license terms, or you have an existing DBRX-tuned application. The Databricks platform integration is genuinely good.

No if you're standing up a new self-hosted MoE deployment in 2026 — pick DeepSeek V3 or Qwen 3 235B for better quality at similar cost. DBRX is now historical reference.

How it compares

  • vs DBRX Base: Same architecture, base vs instruct.
  • vs DeepSeek V3 (671B MoE): V3 is the architecturally-current frontier; DBRX is 2-year-older silicon-equivalent.
  • vs Qwen 3 235B-A22B: Qwen 3 strictly better quality at similar serving cost.
  • vs Mixtral 8x22B: Different MoE expert count (8 large vs 16 small). Comparable era; different architecture choices.

Run this yourself

  • Databricks platform: Native deployment via Databricks Mosaic — the canonical path.
  • Self-hosted single-card: MI300X (192 GB) at FP16, Mac Studio M3 Ultra (192 GB) at Q5 with MLX.
  • Self-hosted datacenter: 4× H100 PCIe at FP8 with vLLM MoE routing.
  • Cloud rental: Runpod / Lambda H100 SXM cluster ~$25-40/hr per node.
  • Vendor: databricks/dbrx-instruct on Hugging Face.

Overview

Databricks' MoE. 132B total / 36B active. Designed for Mosaic ML pipelines; strong tool-calling discipline. Multi-GPU only.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (dbrx)
DBRX Base132B
Datacenter
DBRX Instruct132B
You are here

Strengths

  • Databricks ecosystem alignment
  • Strong tool-calling
  • MoE efficiency

Weaknesses

  • Multi-GPU only
  • Older release — Llama 4 / DeepSeek V4 are sharper

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
AWQ-INT475.0 GB96 GB

Get the model

HuggingFace

Original weights

huggingface.co/databricks/dbrx-instruct

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of DBRX Instruct.

NVIDIA GB200 NVL72
13824GB · nvidia
AMD Instinct MI355X
288GB · amd
AMD Instinct MI325X
256GB · amd
AMD Instinct MI300X
192GB · amd
NVIDIA B200
192GB · nvidia
NVIDIA H100 NVL
188GB · nvidia
NVIDIA H200
141GB · nvidia
Intel Gaudi 3
128GB · intel

Frequently asked

What's the minimum VRAM to run DBRX Instruct?

96GB of VRAM is enough to run DBRX Instruct at the AWQ-INT4 quantization (file size 75.0 GB). Higher-quality quantizations need more.

Can I use DBRX Instruct commercially?

Yes — DBRX Instruct ships under the Databricks Open Model License, which permits commercial use. Always read the license text before deployment.

What's the context length of DBRX Instruct?

DBRX Instruct supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/databricks/dbrx-instruct

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware
  • Dual 3090 vs RTX 5090 (48 GB or 32 GB) →
  • RTX 3090 vs RTX 4090 →
Buyer guides
  • 16 GB vs 24 GB VRAM — what 70B-class models need →
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →
Recommended hardware
  • NVIDIA GB200 NVL72 →
  • AMD Instinct MI355X →
  • AMD Instinct MI325X →
  • AMD Instinct MI300X →
  • NVIDIA B200 →
Alternatives
DBRX Base
Before you buy

Verify DBRX Instruct runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier
Models in the same parameter band as this one
  • DeepSeek V4 Pro (1.6T MoE)
    deepseek · 1600B
    unrated
  • Qwen 3.5 235B-A17B (MoE)
    qwen · 397B
    unrated
  • Qwen 3 235B-A22B
    qwen · 235B
    unrated
  • DeepSeek V4 Flash (284B MoE)
    deepseek · 284B
    unrated
Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.
Step down
Smaller — faster, runs on weaker hardware
  • Llama 3.3 70B Instruct
    llama · 70B
    9.1/10
  • DeepSeek R1 Distill Llama 70B
    deepseek · 70B
    9.0/10