RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Models
  4. /DBRX Base
dbrx
132B parameters
Commercial OK
·Reviewed May 2026

DBRX Base

DBRX base (non-instruct). 132B total / 36B active fine-grained MoE.

License: Databricks Open Model License·Released Mar 27, 2024·Context: 32,768 tokens

Overview

DBRX base (non-instruct). 132B total / 36B active fine-grained MoE.

How to run it

DBRX is Databricks' 132B MoE model (~36B active per token with 4-of-16 expert routing). Run at Q4_K_M via llama.cpp with -ngl 999 -fa -c 8192. Q4_K_M file size ~75 GB on disk. Minimum VRAM: 48 GB — RTX A6000 (48GB) at Q4_K_M with expert offload, or dual RTX 3090 row-split (48 GB total). Recommended: A100 80GB at AWQ-INT4. Throughput: ~15-25 tok/s on A6000 at Q4_K_M (8K context). DBRX uses a fine-grained MoE with 16 experts (4 active) — more routing decisions per token than Mixtral-style (8 experts, 2 active). This means higher routing overhead but potentially better expert specialization. DBRX is a base model — not instruction-tuned. Use for fine-tuning, not direct chat. For instruction-tuned use, look at DBRX-Instruct or fine-tune yourself. Ollama may not have DBRX base — verify the tag. Architecture: standard transformer with MoE FFN layers — well-supported in llama.cpp and potentially vLLM.

Hardware guidance

Minimum: dual RTX 3090 48 GB total at Q4_K_M (tight at 4K context). Recommended: A100 80GB at AWQ-INT4 for serving. Budget: RTX A6000 48GB at Q3_K_M with expert offload. VRAM math: 132B total, ~36B active (4 experts selected). Q4_K_M for full 132B: ~70-80 GB. Expert offload reduces VRAM to ~30-40 GB (active experts in VRAM, rest in RAM). KV cache at 8K: ~10-15 GB. 48 GB with expert offload: borderline. 80 GB A100: comfortable with all experts in VRAM. Mac Studio M4 Max 64GB: Q4_K_M with expert offload, 3-6 tok/s. RTX 4090 24GB: Q3_K_M with aggressive expert offload. Cloud: single A100 at $5-10/hr for AWQ.

What breaks first

  1. Base model, not instruct. DBRX-base has no chat or instruction tuning. Raw completions will continue the prompt style — not answer questions. Fine-tuning or few-shot prompting is necessary. 2. Fine-grained MoE routing overhead. 16 experts with top-4 routing per token means more routing decisions and higher all-to-all communication. On PCIe cards, this routing pattern causes more stalls than Mixtral-style. 3. AWQ calibration gap. DBRX AWQ quants calibrated on generic data may not preserve quality on domain-specific tasks. Test quant quality on your data before deploying. 4. Databricks' license. Verify DBRX's license for commercial use — it may differ from standard open-weight licenses. Check huggingface.co/databricks/dbrx-base for terms.

Runtime recommendation

llama.cpp with -ngl 999 for local use. vLLM for multi-user serving on A100. DBRX's fine-grained MoE benefits from vLLM's expert-parallel scheduling. Avoid Ollama for base models — it's designed for instruct/chat. For fine-tuning: Axolotl or Unsloth with QLoRA.

Common beginner mistakes

Mistake: Expecting DBRX-base to chat. Fix: Base models generate completions, not conversations. Use DBRX-Instruct or fine-tune. Use few-shot prompting with careful formatting for base model use. Mistake: Assuming 132B total means it needs 132 GB VRAM. Fix: MoE with Q4_K_M is 75 GB on disk. Active subset per token is only ~36B (21 GB at Q4). Expert offload makes it run on 48 GB. Mistake: Using standard Llama GGUF conversion. Fix: DBRX has a specific architecture. Use the correct conversion script or pre-converted GGUFs from TheBloke or bartowski. Mistake: Ignoring the 16-expert routing overhead. Fix: DBRX's top-4-of-16 routing is more complex than Mixtral's top-2-of-8. Expect higher latency variance per token due to more frequent expert switches.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (dbrx)
DBRX Base132B
You are here
DBRX Instruct132B
Datacenter

Strengths

  • Fine-grained MoE
  • Databricks Mosaic recipe

Weaknesses

  • Use dbrx-instruct for chat

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M75.0 GB96 GB

Get the model

HuggingFace

Original weights

huggingface.co/databricks/dbrx-base

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of DBRX Base.

NVIDIA GB200 NVL72
13824GB · nvidia
AMD Instinct MI355X
288GB · amd
AMD Instinct MI325X
256GB · amd
AMD Instinct MI300X
192GB · amd
NVIDIA B200
192GB · nvidia
NVIDIA H100 NVL
188GB · nvidia
NVIDIA H200
141GB · nvidia
Intel Gaudi 3
128GB · intel

Frequently asked

What's the minimum VRAM to run DBRX Base?

96GB of VRAM is enough to run DBRX Base at the Q4_K_M quantization (file size 75.0 GB). Higher-quality quantizations need more.

Can I use DBRX Base commercially?

Yes — DBRX Base ships under the Databricks Open Model License, which permits commercial use. Always read the license text before deployment.

What's the context length of DBRX Base?

DBRX Base supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/databricks/dbrx-base

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware
  • Dual 3090 vs RTX 5090 (48 GB or 32 GB) →
  • RTX 3090 vs RTX 4090 →
Buyer guides
  • 16 GB vs 24 GB VRAM — what 70B-class models need →
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →
Recommended hardware
  • NVIDIA GB200 NVL72 →
  • AMD Instinct MI355X →
  • AMD Instinct MI325X →
  • AMD Instinct MI300X →
  • NVIDIA B200 →
Alternatives
DBRX Instruct
Before you buy

Verify DBRX Base runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →
Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier
Models in the same parameter band as this one
  • DeepSeek V4 Pro (1.6T MoE)
    deepseek · 1600B
    unrated
  • Qwen 3.5 235B-A17B (MoE)
    qwen · 397B
    unrated
  • Qwen 3 235B-A22B
    qwen · 235B
    unrated
  • DeepSeek V4 Flash (284B MoE)
    deepseek · 284B
    unrated
Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.
Step down
Smaller — faster, runs on weaker hardware
  • Llama 3.3 70B Instruct
    llama · 70B
    9.1/10
  • DeepSeek R1 Distill Llama 70B
    deepseek · 70B
    9.0/10