by Databricks (Mosaic)
Databricks' MoE family — DBRX Base + DBRX Instruct. 132B total / 36B active MoE. Surpassed by 2026 MoE leaders but remains relevant for Databricks platform integration.
Start with DBRX Instruct via vLLM on datacenter hardware — 132B total MoE (36B active, 16 experts top-4 routing) requires 4× H100 SXM minimum. DBRX is the only open-weight model built by Databricks and fine-tuned specifically for SQL generation, data engineering, and structured data analysis — it outperforms Llama 3 70B on SQL benchmarks (Spider, BIRD) by 10-15 points. If you need SQL/data-analysis quality at consumer scale, skip DBRX entirely and use DeepSeek R1-Distill-Qwen-32B instead — similar code quality, 1/4 the VRAM. DBRX is licensed under the Databricks Open Model License with use-based restrictions — review before production deployment. For Databricks-native environments (Unity Catalog, Mosaic AI), DBRX has first-class integration but for self-hosted deployments the infrastructure cost is high.
For single-user local: DBRX MoE at Q4 requires 130 GB total VRAM — practical only on Mac Studio M3 Ultra 192 GB via llama.cpp with expert offloading (6 tok/s). This is borderline usable. For multi-user serving: vLLM 0.6.0+ with FP8 on 4× H100 SXM — expert parallelism across 4 GPUs achieves ~3,000 tok/s at batch 32. For Databricks environments: Mosaic AI Model Serving with optimized DBRX endpoint — this is the intended deployment path and achieves ~8,000 tok/s at scale. For SQL/data pipelines: deploy DBRX behind a REST API with batching — the high per-request cost means you should batch SQL generation queries and cache results aggressively. DBRX uses a fine-grained MoE with 16 experts and top-4 routing — router weights must stay at FP16 (never quantize). ExLlamaV2 does not support DBRX MoE. For alternatives with similar data-engineering capability at lower cost, compare DeepSeek Coder V3.
Models in this family with our verdicts
Verify DBRX runs on your specific hardware before committing money.