Open-weight models
66 models tracked. Hardware requirements, license, and quantization sizes for each.
qwen
Qwen 3 235B-A22B
Qwen 3 flagship MoE. 235B total / 22B active per token, with built-in 'thinking' and 'non-thinking' modes that trade speed for reasoning dep
Qwen 3 30B-A3B
Mid-tier Qwen 3 MoE. 30B total / 3B active means 70B-class quality at 7B-class inference speed on a single 24GB card. The sweet spot of the
Qwen 2.5 Coder 32B Instruct
Coding-specialist Qwen 2.5. Beats GPT-4o on HumanEval and matches Sonnet on many code-edit benchmarks. The default local-coding model on 24G
Qwen 3 32B
Dense Qwen 3 32B. Best dense open-weight model in its size class at release; pairs nicely with a single RTX 5090 or 4090.
Qwen 3 8B
Qwen 3 at the 8B scale. Direct head-to-head against Llama 3.1 8B on most benchmarks; usually wins on coding and structured output.
Qwen 3 14B
14B Qwen 3. Fits on 12GB cards at Q4. Strong default for users with a single mid-range GPU.
Qwen 2.5 7B Instruct
The community-default small Qwen prior to Qwen 3. Still widely used because of mature ecosystem support.
Qwen 2.5 14B Instruct
14B Qwen 2.5. Sweet spot for 16GB VRAM. Many production deployments still on this version.
Qwen 2.5 32B Instruct
Dense 32B Qwen 2.5. Strong daily-driver on 24GB cards prior to Qwen 3 32B.
QwQ 32B Preview
Qwen team's reasoning-focused experimental release. Visible chain-of-thought in <think> tags. Precursor to Qwen 3's thinking mode.
Qwen 2.5 72B Instruct
The flagship of Qwen 2.5. Workstation-tier; needs 48GB+ VRAM for usable inference.
Qwen 3 4B
Compact Qwen 3 for edge and laptop deployment. Outperforms many 7B models from prior generations.
llama
Llama 3.1 8B Instruct
Meta's small flagship. Strong general reasoning, 128K context, broad multilingual. The default first try for most local-AI use cases on cons
Llama 4 Scout
Meta's 2026 flagship MoE model. 109B total parameters with only 17B active per forward pass and a record 10-million-token context window — u
Llama 3.3 70B Instruct
Late-2024 refresh of the 70B Llama line. Roughly matches Llama 3.1 405B on most benchmarks at one-fifth the parameter count. The default hig
Llama 3.2 3B Instruct
Lightweight 3B for edge and laptop deployment. Runs comfortably on 8GB VRAM at 30+ tok/s on Apple Silicon.
Llama 3.1 70B Instruct
The 70B sibling of Llama 3.1 8B. Strong generalist reasoning with 128K context, popular base for agentic fine-tunes (Hermes 3, Nemotron). Mo
Llama 3.1 Nemotron 70B Instruct
NVIDIA's HelpSteer2-tuned Llama 3.1 70B. Topped Arena Hard at release. The pre-Nemotron-3 NVIDIA reference open weights.
Llama 3.2 11B Vision Instruct
First-party multimodal Llama. Accepts images alongside text for VQA, document understanding, and chart reading. Runs on 12GB+ VRAM.
Llama 4 Maverick
Meta's high-end Llama 4 sibling — 128 experts MoE built for performance over efficiency. Multilingual strength is its standout. Effectively
Llama 3.1 Nemotron Ultra 253B
NVIDIA's top open reasoning model in the Llama 3.1 lineage. Server-tier; trained for groundbreaking reasoning accuracy on agentic workloads.
Llama 3.1 Nemotron Nano 8B
Smallest of the Nemotron reasoning trio. NAS-optimized for inference efficiency on RTX hardware.
Llama 3.2 1B Instruct
True edge-tier Llama. Runs on a phone or Raspberry Pi. Useful for classification, simple summarization, and on-device agents.
Llama 3.2 90B Vision Instruct
The 90B vision Llama. Best-in-class first-party multimodal open weight at the time of release. Workstation-class only.
deepseek
DeepSeek R1 (671B reasoning)
Open reasoning model that closed the gap with frontier proprietary reasoners. Visible chain-of-thought, MIT license, and a family of distill
DeepSeek R1 Distill Llama 70B
Reasoning distillation onto Llama 3.3 70B. Best-in-class open-weight reasoner you can actually fit on a workstation.
DeepSeek R1 Distill Qwen 32B
32B distill — fits on a single 24GB card with reasoning capability. Best price-per-thinking-token combo for prosumers.
DeepSeek V3 (671B MoE)
DeepSeek's flagship MoE — 671B total / 37B active. Server-tier, but the smaller R1 distills make this lineage approachable.
DeepSeek R1 Distill Qwen 7B
Smallest practical R1 distill. Reasoning on a 6GB GPU.
DeepSeek R1 Distill Qwen 14B
14B reasoning distill. Fits on 12GB cards.
DeepSeek Coder V2 Lite (16B)
MoE coding specialist — 16B total / 2.4B active. Fast on 12GB cards.
gemma
Gemma 4 31B Dense
Google's flagship dense Gemma 4. Beats some 400B-class proprietary models on benchmarks. Targets the 24GB single-GPU sweet spot.
Gemma 4 26B MoE
MoE variant of Gemma 4. Faster per-token than the 31B dense at similar quality on most tasks.
Gemma 3 27B
Pre-Gemma-4 flagship. Multimodal (4B+ variants), 128K context, 140 languages. Strong daily driver on 24GB cards.
Gemma 4 E4B (Effective 4B)
Edge-class Gemma 4. The 'Effective 4B' branding signals it punches above its parameter count via training-data quality.
Gemma 3 12B
12B Gemma 3. Fits on 12GB consumer cards. Multimodal.
Gemma 3 4B
4B Gemma 3 for edge. Multimodal.
Gemma 2 9B Instruct
Mid-size Gemma 2. Strong chat quality with a different training mix from Llama family.
Gemma 4 E2B (Effective 2B)
Smallest Gemma 4. Designed for phones and Raspberry-Pi-class hardware.
Gemma 3 1B
Smallest text-only Gemma 3 for phones and IoT.
MedGemma 27B
Medical-specialist Gemma fine-tune. Trained on de-identified medical literature and imaging. Research use under HAI-DEF terms.
CodeGemma 7B
Coding-specialist Gemma. Decent FIM completion. Now mostly historical with Qwen 2.5 Coder dominating.
other
GLM-5
Zhipu's GLM-5 currently leads the Open LLM Leaderboard 2026. Strong reasoning and bilingual EN/ZH capability.
Nemotron 3 Nano (30B-A3B)
NVIDIA's hybrid Mamba-2 + Transformer MoE for on-device agents. 30B total / 3B active. 1M-token context window with reasoning ON/OFF modes a
Kimi K2.6
Moonshot's long-context, agent-oriented MoE. Optimized for stability under tool use and multi-step coding/planning workflows.
Nemotron 3 Super (120B-A12B)
Workstation-tier Nemotron 3. 120B total / 12B active. 5× higher throughput than the prior Super, 1M context, designed for multi-agent applic
OLMo 2 32B
Fully-open OLMo 2. AI2 publishes the full training data, code, and weights — the most reproducible 32B model.
mistral
Mistral Small 3 24B
Re-release of Mistral Small under Apache 2.0. Competitive with Llama 3.3 70B at one-third the size for many tasks.
Mistral Nemo 12B Instruct
Joint Mistral/NVIDIA release with native 128K context and a new Tekken tokenizer. Strong multilingual; popular fine-tune base.
Pixtral 12B
Mistral's multimodal entry. 12B parameters, vision + text, Apache 2.0. Good document and chart understanding.
Codestral 22B
Mistral's coding-specialist. Strong fill-in-the-middle for IDE autocompletion. Personal/research use only.
Mistral Large 2 (123B)
Mistral's flagship dense model. Open weights but restricted commercial license — research and non-commercial only.
Mistral 7B Instruct v0.3
The reference 7B from Mistral. Apache 2.0 with native function calling. Mature ecosystem.
phi
Phi-4 14B
Microsoft's Phi-4 14B trained on synthetic textbook-quality data. Punches above weight on reasoning and math; MIT licensed.
Phi-4 Reasoning 14B
Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.
Phi-3.5 Mini Instruct
Compact 3.8B Phi for edge deployment. 128K context. Strong reasoning per parameter.
Phi-3.5 Vision
Multimodal Phi 3.5. Document and chart understanding at edge size. MIT licensed.
hermes
dolphin
mixtral
Mixtral 8x7B Instruct
The MoE model that introduced the 8-experts pattern to the open-weight world. 47B params total, 13B active. Still a viable workhorse on 36GB
Mixtral 8x22B Instruct
The bigger Mixtral. 141B total / 39B active. Strong general model, workstation-tier deployment.