The frontier of open-weight model releases
Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.
Recent releases (12 newest)
Catalog entries with the most recent release dates. Use the authority badges to spot which have full editorial coverage (L1.25 enriched + benchmark) and which are catalog-only.
Qwen 3.5 235B-A17B
frontier-tier permissively-licensed serving on cluster hardware
Mistral Medium 3.5 (675B MoE)
Mistral Medium 3 24B (dense)
research / non-commercial workstation deployments
DeepSeek V4 Pro
frontier-tier reasoning research
DeepSeek V4 Flash
workstation-cluster V4-class without frontier hardware
OLMo 2 32B
academic / regulatory-sensitive 32B research
Phi-4 Reasoning Mini 4B
edge-tier reasoning
Llama 4 Maverick
frontier-tier reasoning + multimodal
Llama 4 Scout
production multimodal serving — image + text at workstation-cluster scale
Gemma 4 31B
workstation-tier multilingual + vision
Gemma 4 26B MoE
workstation MoE — first-of-kind in Gemma family
Gemma 4 E4B (Effective 4B)
New reasoning models
Models with explicit thinking-block emission — DeepSeek R1 family, QwQ, Kimi, Magistral, Qwen 3 reasoning-mode. /stacks/local-reasoning-model for the canonical deployment recipe.
Kimi K2.6
frontier-tier reasoning research
Magistral 32B
research / non-commercial reasoning at 32B scale
Kimi K1.5
deep math + reasoning research
Qwen 3 Coder 32B
coding-specialized agent workloads
DeepSeek R1 Distill Qwen 3 32B
workstation reasoning with Qwen 3 base improvements
Qwen 3 72B
frontier-tier general reasoning at workstation scale
New coding models
Coding-specialized fine-tunes. The Qwen Coder lineage is the current open-weight benchmark leader; DeepSeek Coder V3, Codestral, Devstral, OpenCoder are the credible alternatives. /stacks/local-coding-agent for the canonical deployment recipe.
DeepSeek Coder V3
workstation coding alternative to Qwen 2.5 Coder
Devstral Small 2 24B
Apache 2.0 coding alternative to Qwen 2.5 Coder
Yi Coder 9B
8GB-VRAM coding
Qwen 2.5 Coder 32B Instruct
single-user autonomous coding agents on RTX 4090 / 5090 / dual-A100 hardware
Qwen 2.5 Coder 14B Instruct
16GB-VRAM coding
OpenCoder 8B
academic / reproducibility-sensitive coding research
New multimodal models
Vision-language models. The 2025-2026 wave: Llama 4 Scout / Maverick, Qwen 2.5-VL, Pixtral, Janus-Pro, Phi-4 Multimodal. /stacks/local-vision-model for the canonical deployment recipe.
Llama 4 Maverick
frontier-tier reasoning + multimodal
Gemma 4 31B
workstation-tier multilingual + vision
Gemma 4 26B MoE
workstation MoE — first-of-kind in Gemma family
Gemma 4 E4B (Effective 4B)
Gemma 4 E2B (Effective 2B)
Phi-4 Multimodal
16GB-consumer multimodal Q&A
New MoE models
Mixture-of-Experts releases. Active-parameter efficiency shapes the deployment economics — see /systems/distributed-inference for the architectural depth.
Qwen 3.5 235B-A17B
frontier-tier permissively-licensed serving on cluster hardware
Mistral Medium 3.5 (675B MoE)
DeepSeek V4 Pro
frontier-tier reasoning research
DeepSeek V4 Flash
workstation-cluster V4-class without frontier hardware
DeepSeek V4
frontier-tier reasoning on multi-machine clusters
GLM-5 Pro
Chinese-language enterprise serving
New edge / phone-tier models
Sub-4B models for phone / Pi / embedded deployment. Phi-4 Mini, Gemma 3 1B, MiniCPM 3 4B, SmolLM 3, Hermes 3 3B, Dolphin 3 3B, RWKV 7 Goose 1.5B.
Phi-4 Reasoning Mini 4B
edge-tier reasoning
Phi-4 Mini 4B
edge / embedded reasoning
SmolLM 3 3B
edge-tier reasoning
Gemma 3 4B
edge-tier chat — Apple Silicon laptop friendly
Gemma 3 1B
edge / embedded chat
RWKV 7 'Goose' 1.5B
long-context edge inference where memory matters more than quality
Enrichment gaps — OPERATOR queue
High-relevance catalog entries (7B-100B) that lack L1.25 enrichment, verdict, AND benchmark. These render noindex today — the next sprint's editorial queue. Surfacing them here keeps the gap visible.
Qwen 3 30B-A3B
workstation MoE inference — efficient consumer-tier alternative to dense 32B
Gemma 4 31B
workstation-tier multilingual + vision
Gemma 4 26B MoE
workstation MoE — first-of-kind in Gemma family
Nemotron 3 Nano (30B-A3B)
DeepSeek R1 Distill Qwen 7B
consumer-tier reasoning at the 8GB tier
DeepSeek R1 Distill Qwen 14B
16GB-VRAM reasoning
Llama 3.1 Nemotron 70B Instruct
Llama 3.2 11B Vision Instruct
Going deeper
- Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
- Execution stacks — recipes that combine models with runtimes + hardware.
- Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
- Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.