Open-weight models

66 models tracked. Hardware requirements, license, and quantization sizes for each.

qwen

Qwen 3 235B-A22B

Qwen 3 flagship MoE. 235B total / 22B active per token, with built-in 'thinking' and 'non-thinking' modes that trade speed for reasoning dep

Commercial OK

128K ctx

Qwen 3 30B-A3B

30B

Mid-tier Qwen 3 MoE. 30B total / 3B active means 70B-class quality at 7B-class inference speed on a single 24GB card. The sweet spot of the

Commercial OK

128K ctx

Qwen 2.5 Coder 32B Instruct

32B

Coding-specialist Qwen 2.5. Beats GPT-4o on HumanEval and matches Sonnet on many code-edit benchmarks. The default local-coding model on 24G

Commercial OK

128K ctx

Qwen 3 32B

32B

Dense Qwen 3 32B. Best dense open-weight model in its size class at release; pairs nicely with a single RTX 5090 or 4090.

Commercial OK

128K ctx

Qwen 3 8B

Qwen 3 at the 8B scale. Direct head-to-head against Llama 3.1 8B on most benchmarks; usually wins on coding and structured output.

Commercial OK

128K ctx

Qwen 3 14B

14B

14B Qwen 3. Fits on 12GB cards at Q4. Strong default for users with a single mid-range GPU.

Commercial OK

128K ctx

Qwen 2.5 7B Instruct

The community-default small Qwen prior to Qwen 3. Still widely used because of mature ecosystem support.

Commercial OK

128K ctx

Qwen 2.5 14B Instruct

14B

14B Qwen 2.5. Sweet spot for 16GB VRAM. Many production deployments still on this version.

Commercial OK

128K ctx

Qwen 2.5 32B Instruct

32B

Dense 32B Qwen 2.5. Strong daily-driver on 24GB cards prior to Qwen 3 32B.

Commercial OK

128K ctx

QwQ 32B Preview

32B

Qwen team's reasoning-focused experimental release. Visible chain-of-thought in <think> tags. Precursor to Qwen 3's thinking mode.

Commercial OK

32K ctx

Qwen 2.5 72B Instruct

72B

The flagship of Qwen 2.5. Workstation-tier; needs 48GB+ VRAM for usable inference.

Commercial OK

128K ctx

Qwen 3 4B

Compact Qwen 3 for edge and laptop deployment. Outperforms many 7B models from prior generations.

Commercial OK

128K ctx

llama

Llama 3.1 8B Instruct

Meta's small flagship. Strong general reasoning, 128K context, broad multilingual. The default first try for most local-AI use cases on cons

Commercial OK

128K ctx

Llama 4 Scout

109B

Meta's 2026 flagship MoE model. 109B total parameters with only 17B active per forward pass and a record 10-million-token context window — u

Commercial OK

9766K ctx

Llama 3.3 70B Instruct

70B

Late-2024 refresh of the 70B Llama line. Roughly matches Llama 3.1 405B on most benchmarks at one-fifth the parameter count. The default hig

Commercial OK

128K ctx

Llama 3.2 3B Instruct

Lightweight 3B for edge and laptop deployment. Runs comfortably on 8GB VRAM at 30+ tok/s on Apple Silicon.

Commercial OK

128K ctx

Llama 3.1 70B Instruct

70B

The 70B sibling of Llama 3.1 8B. Strong generalist reasoning with 128K context, popular base for agentic fine-tunes (Hermes 3, Nemotron). Mo

Commercial OK

128K ctx

Llama 3.1 Nemotron 70B Instruct

70B

NVIDIA's HelpSteer2-tuned Llama 3.1 70B. Topped Arena Hard at release. The pre-Nemotron-3 NVIDIA reference open weights.

Commercial OK

128K ctx

Llama 3.2 11B Vision Instruct

11B

First-party multimodal Llama. Accepts images alongside text for VQA, document understanding, and chart reading. Runs on 12GB+ VRAM.

Commercial OK

Multimodal

128K ctx

Llama 4 Maverick

400B

Meta's high-end Llama 4 sibling — 128 experts MoE built for performance over efficiency. Multilingual strength is its standout. Effectively

Commercial OK

Multimodal

977K ctx

Llama 3.1 Nemotron Ultra 253B

253B

NVIDIA's top open reasoning model in the Llama 3.1 lineage. Server-tier; trained for groundbreaking reasoning accuracy on agentic workloads.

Commercial OK

128K ctx

Llama 3.1 Nemotron Nano 8B

Smallest of the Nemotron reasoning trio. NAS-optimized for inference efficiency on RTX hardware.

Commercial OK

128K ctx

Llama 3.2 1B Instruct

True edge-tier Llama. Runs on a phone or Raspberry Pi. Useful for classification, simple summarization, and on-device agents.

Commercial OK

128K ctx

Llama 3.2 90B Vision Instruct

90B

The 90B vision Llama. Best-in-class first-party multimodal open weight at the time of release. Workstation-class only.

Commercial OK

Multimodal

128K ctx

deepseek

DeepSeek R1 (671B reasoning)

671B

Open reasoning model that closed the gap with frontier proprietary reasoners. Visible chain-of-thought, MIT license, and a family of distill

Commercial OK

128K ctx

DeepSeek R1 Distill Llama 70B

70B

Reasoning distillation onto Llama 3.3 70B. Best-in-class open-weight reasoner you can actually fit on a workstation.

Commercial OK

128K ctx

DeepSeek R1 Distill Qwen 32B

32B

32B distill — fits on a single 24GB card with reasoning capability. Best price-per-thinking-token combo for prosumers.

Commercial OK

128K ctx

DeepSeek V3 (671B MoE)

671B

DeepSeek's flagship MoE — 671B total / 37B active. Server-tier, but the smaller R1 distills make this lineage approachable.

Commercial OK

64K ctx

DeepSeek R1 Distill Qwen 7B

Smallest practical R1 distill. Reasoning on a 6GB GPU.

Commercial OK

128K ctx

DeepSeek R1 Distill Qwen 14B

14B

14B reasoning distill. Fits on 12GB cards.

Commercial OK

128K ctx

DeepSeek Coder V2 Lite (16B)

16B

MoE coding specialist — 16B total / 2.4B active. Fast on 12GB cards.

Commercial OK

128K ctx

gemma

Gemma 4 31B Dense

31B

Google's flagship dense Gemma 4. Beats some 400B-class proprietary models on benchmarks. Targets the 24GB single-GPU sweet spot.

Commercial OK

Multimodal

128K ctx

Gemma 4 26B MoE

26B

MoE variant of Gemma 4. Faster per-token than the 31B dense at similar quality on most tasks.

Commercial OK

Multimodal

128K ctx

Gemma 3 27B

27B

Pre-Gemma-4 flagship. Multimodal (4B+ variants), 128K context, 140 languages. Strong daily driver on 24GB cards.

Commercial OK

Multimodal

128K ctx

Gemma 4 E4B (Effective 4B)

Edge-class Gemma 4. The 'Effective 4B' branding signals it punches above its parameter count via training-data quality.

Commercial OK

Multimodal

128K ctx

Gemma 3 12B

12B

12B Gemma 3. Fits on 12GB consumer cards. Multimodal.

Commercial OK

Multimodal

128K ctx

Gemma 3 4B

4B Gemma 3 for edge. Multimodal.

Commercial OK

Multimodal

128K ctx

Gemma 2 9B Instruct

Mid-size Gemma 2. Strong chat quality with a different training mix from Llama family.

Commercial OK

8K ctx

Gemma 4 E2B (Effective 2B)

Smallest Gemma 4. Designed for phones and Raspberry-Pi-class hardware.

Commercial OK

Multimodal

128K ctx

Gemma 3 1B

Smallest text-only Gemma 3 for phones and IoT.

Commercial OK

32K ctx

MedGemma 27B

27B

Medical-specialist Gemma fine-tune. Trained on de-identified medical literature and imaging. Research use under HAI-DEF terms.

Restricted

Multimodal

128K ctx

CodeGemma 7B

Coding-specialist Gemma. Decent FIM completion. Now mostly historical with Qwen 2.5 Coder dominating.

Commercial OK

8K ctx

other

GLM-5

200B

Zhipu's GLM-5 currently leads the Open LLM Leaderboard 2026. Strong reasoning and bilingual EN/ZH capability.

Commercial OK

195K ctx

Nemotron 3 Nano (30B-A3B)

30B

NVIDIA's hybrid Mamba-2 + Transformer MoE for on-device agents. 30B total / 3B active. 1M-token context window with reasoning ON/OFF modes a

Commercial OK

977K ctx

Kimi K2.6

1000B

Moonshot's long-context, agent-oriented MoE. Optimized for stability under tool use and multi-step coding/planning workflows.

Commercial OK

1953K ctx

Nemotron 3 Super (120B-A12B)

120B

Workstation-tier Nemotron 3. 120B total / 12B active. 5× higher throughput than the prior Super, 1M context, designed for multi-agent applic

Commercial OK

977K ctx

OLMo 2 32B

32B

Fully-open OLMo 2. AI2 publishes the full training data, code, and weights — the most reproducible 32B model.

Commercial OK

32K ctx

mistral

Mistral Small 3 24B

24B

Re-release of Mistral Small under Apache 2.0. Competitive with Llama 3.3 70B at one-third the size for many tasks.

Commercial OK

32K ctx

Mistral Nemo 12B Instruct

12B

Joint Mistral/NVIDIA release with native 128K context and a new Tekken tokenizer. Strong multilingual; popular fine-tune base.

Commercial OK

128K ctx

Pixtral 12B

12B

Mistral's multimodal entry. 12B parameters, vision + text, Apache 2.0. Good document and chart understanding.

Commercial OK

Multimodal

128K ctx

Codestral 22B

22B

Mistral's coding-specialist. Strong fill-in-the-middle for IDE autocompletion. Personal/research use only.

Restricted

32K ctx

Mistral Large 2 (123B)

123B

Mistral's flagship dense model. Open weights but restricted commercial license — research and non-commercial only.

Restricted

128K ctx

Mistral 7B Instruct v0.3

The reference 7B from Mistral. Apache 2.0 with native function calling. Mature ecosystem.

Commercial OK

32K ctx

phi

Phi-4 14B

14B

Microsoft's Phi-4 14B trained on synthetic textbook-quality data. Punches above weight on reasoning and math; MIT licensed.

Commercial OK

16K ctx

Phi-4 Reasoning 14B

14B

Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.

Commercial OK

32K ctx

Phi-3.5 Mini Instruct

3.8B

Compact 3.8B Phi for edge deployment. 128K context. Strong reasoning per parameter.

Commercial OK

128K ctx

Phi-3.5 Vision

4.2B

Multimodal Phi 3.5. Document and chart understanding at edge size. MIT licensed.

Commercial OK

Multimodal

128K ctx

hermes

Hermes 3 Llama 3.1 8B

NousResearch's Hermes fine-tune of Llama 3.1 8B. Stronger system-prompt adherence, JSON output, role-play, and agent steering than the base

Commercial OK

128K ctx

Hermes 3 Llama 3.1 70B

70B

Hermes 3 at 70B. Workstation-tier agent-tuned model.

Commercial OK

128K ctx

dolphin

Dolphin 3.0 Mistral 24B

24B

Eric Hartford's Dolphin fine-tune of Mistral Small 3 — uncensored, function-calling, agent-friendly.

Commercial OK

32K ctx

mixtral

Mixtral 8x7B Instruct

47B

The MoE model that introduced the 8-experts pattern to the open-weight world. 47B params total, 13B active. Still a viable workhorse on 36GB

Commercial OK

32K ctx

Mixtral 8x22B Instruct

141B

The bigger Mixtral. 141B total / 39B active. Strong general model, workstation-tier deployment.

Commercial OK

64K ctx

command-r

Command R+ 104B

104B

Cohere's flagship — RAG-tuned, multilingual. Open weights but non-commercial.

Restricted

128K ctx

Command R 35B

35B

Cohere's mid-tier — RAG and tool use. Non-commercial license.

Restricted

128K ctx

yi

Yi 1.5 34B

34B

01.AI's 34B model. Solid bilingual EN/ZH performance, Apache 2.0.

Commercial OK

16K ctx

wizard

WizardLM-2 8x22B

141B

Microsoft's RLHF-heavy fine-tune of Mixtral 8x22B. Briefly the top open chat model on LMSYS at release.

Commercial OK

64K ctx