What's trending in local AI right now
The repos adding stars fastest in the last 30 days, across all ecosystem categories. Star velocity alone is noise — what matters is which repos are catching attention because they represent real architectural shifts. Each entry below carries the editorial significance that separates “hype-cycle spike” from “ecosystem-relevance signal.”
Crossed 350k GitHub stars in early April 2026 — fastest community-growth curve ever recorded for an open-source agent product. Founder joined OpenAI; project transitioned to foundation governance. The defining signal that open-source autonomous coding agents have caught up to (and on some workloads, surpassed) closed-source flagships.
Architecture: Anthropic-style reasoning loop (thinking → planning → execution decomposition) instead of ReAct-style tool-call loops. MCP-first dispatcher treats built-ins as MCP servers internally. The architectural break that explains the velocity advantage on complex tasks.
DeepSeek's reasoning model — explicit chain-of-thought emission as the architectural primitive. The R1 Distill Qwen 32B variant captures ~80% of the full R1's reasoning at 5% of the VRAM, fits an RTX 4090 in AWQ-INT4. The reason local-reasoning workloads became viable in 2025-2026 at all.
Crossed 113k stars by May 2026 — fastest-growing managed-crawler MCP. Handles JS rendering, anti-bot evasion, and large-site map+scrape jobs at scale. The pragmatic upgrade from mcp-server-fetch when an agent needs to crawl thousands of pages or work against JS-heavy SPAs.
Architecture: Cloud-managed rendering pipeline + thin OSS MCP client. The architectural tradeoff: outsource rendering complexity for crawl-volume scenarios.
The longest-track-record open-source autonomous coding agent. Planning Mode in v1.6+ closed the 'agent loops without making progress' gap that plagued earlier products. Currently the most stable production deployment in the category — pick OpenHands when reliability matters more than velocity (compare to OpenClaw).
Architecture: ReAct-style loops, MCP-first tool dispatcher, sandbox executor with Docker / chroot / native modes. The /stacks/local-coding-agent canonical recipe is built around OpenHands.
Anthropic's reference MCP server collection. Filesystem, Git, Memory, Postgres, Brave Search, Fetch, Sequential Thinking. The canonical implementations every third-party MCP server gets compared to. Anthropic-maintained; 97 million installs across the ecosystem (April 2026).
Architecture: Each server is a separate stdio process; the protocol is JSON-RPC 2.0 with explicit lifecycle + capability negotiation.
The Qwen team's reasoning-toggle generation. Native <think> reasoning blocks; toggle per-query. Qwen 3 32B AWQ-INT4 at 36.5 tok/s on RTX 4090 makes serious reasoning workloads viable on consumer hardware. The architectural shift: reasoning quality is now a configurable parameter, not a model choice.
Architecture: Toggle-style reasoning means you don't pay the reasoning-token tax on simple queries. The right pick when workload mix is mostly chat with occasional reasoning needs.
Default self-hosted ChatGPT-style frontend in 2026. Provider-agnostic by design (Ollama, vLLM, LM Studio, Anthropic, OpenAI in one model switcher). The architectural shift from 2025: Pipelines feature for multi-modal extension surface. The team-friendly auth + RBAC + admin dashboard make it the replacement-of-choice for paid ChatGPT subscriptions at 5-50 user team scale.
Microsoft's MCP server that drives a real browser via Playwright (Chromium / Firefox / WebKit). Ships ~22 tools that operate against the page's accessibility tree rather than pixel coordinates — the architectural break that made web automation reliable for agent workflows. The default web automation MCP for any agent that needs DOM and follows real navigation.
Architecture: Accessibility-tree targeting beats coordinate clicking by 5-10x reliability. Headed + headless modes; per-session browser process management.
The 2026 breakthrough release for consumer-cluster inference. Thunderbolt 5 + macOS 26.2 RDMA cut inter-Mac latency by ~99% on M4 Pro+ hardware. DeepSeek V3 671B running at 5.37 tok/s on 8x M4 Pro Mac Minis is now a credible personal-cluster benchmark, not a tech demo. The architectural shift this represents: consumer hardware can now run frontier-class models locally.
Architecture: Pipeline parallel via MLX over Thunderbolt 5 RDMA. Auto-discovery of nearby Apple Silicon devices. The first credible WAN-or-LAN-cluster inference solution where consumer Mac hardware genuinely competes with datacenter SKUs on tokens-per-watt.
VS Code-native autonomous agent. Picked up Roo Code's user base after that project shut down in May 2026. Stronger autonomous-task capability than Continue; lighter than OpenHands. The current frontrunner in the IDE-resident agent category.
Architecture: Lives inside VS Code rather than in a separate window — different UX paradigm from OpenHands. MCP support; talks to any OpenAI-compatible runtime.
The drop-in agent-memory default. Mem0g (graph variant) took the multi-hop agent-memory benchmark lead at 68.4% LLM Score in April 2026 — the flat-vector vs graph-memory architectural debate now has clear empirical evidence on the graph side for multi-hop tasks.
Architecture: Vector retrieval + implicit consolidation; Mem0g variant adds graph reasoning. 20-line config integration; the friction-minimum memory framework.
Block's MCP-first agent. Differentiates by treating MCP as the primary extension surface (not as one of many). Strong support for both stdio and remote MCP; the right pick when MCP-heaviness is core to your workflow rather than incidental.
Architecture: Designed as an extension platform — built-in tools are minimal; everything substantial wires in via MCP. Lighter-weight than OpenHands; stronger MCP integration than OpenClaw.
Production-default inference engine. v0.17.1 (March 2026) shipped Model Runner V2 with up to 56% higher throughput on GB200. PagedAttention turned KV-cache efficiency into a 5-24x throughput delta over baselines; the project's discipline through 2024-2026 turned that single innovation into a complete production stack.
Architecture: PagedAttention + continuous batching + prefix caching + chunked prefill. The OpenAI-compatible API on top makes it a drop-in for any team running an OpenAI bill they'd rather not pay.
GitHub's first-party MCP server. Replaced the original Anthropic reference port in late 2025. Broader API coverage, OAuth-ready transport, release cadence tied to GitHub itself rather than community velocity. The signal: MCP is now the canonical machine-readable interface to GitHub for agent workflows.
Architecture: Direct GraphQL/REST passthrough; OAuth-aware. Issues, PRs, code search, Actions, discussions all surface as MCP tools.
OS-style explicit agent memory. v0.7 (April 2026) shipped genuinely usable explicit-memory hierarchy — the architectural opposite of Mem0's implicit consolidation. Pick Letta when deterministic memory state matters and the agent needs to reason about its own memory.
Architecture: Working memory + archival memory + explicit paging. The agent itself decides when to archive, compress, evict — different abstraction from vector-similarity-only memory frameworks.
Reached 1.0 in early 2026 with stable Neo4j integration and a polished agent-memory API. The OSS counterpart to Zep is now production-ready for teams that want full local control over graph memory without the hosted-service dependency.
Architecture: Temporal knowledge graph over Neo4j. Multi-hop reasoning over consolidated agent memory; the right pick when 'what did Bob decide three sessions ago and why' is the shape of question.
RAG-first workspace tool. Workspace = collection isolation primitive; native ingestion pipeline; LanceDB embedded by default. The right pick when document-first workflows matter more than chat-first (Open WebUI). MCP integration in 2025-2026 turned it from RAG-frontend into agent-front-door.
The credible architectural alternative to vLLM. RadixAttention's tree-structured KV cache is a real advantage on shared-prefix traffic; the SGL DSL's structured-generation primitives turn 5-10x token efficiency into a defensible feature for any workload that already enforces output structure client-side.
Architecture: Tree-structured KV cache (vs vLLM's flat blocks) + structured-generation DSL. Cross-replica prefix-cache sync makes the architectural advantage compound at multi-node scale.
Production single-node vector DB. Best ops surface in the category; PQ quantization makes it the right pick for storage-constrained deployments. Standard upgrade path from LanceDB when single-workspace scale crosses ~500K vectors.
The embedded-first vector store. Single-folder Arrow files; no server process to firewall. Default for offline / single-process deployments; scales further than Chroma before needing a server. The right vector backend for the /stacks/offline-rag-workstation recipe.
OSS-first LLM tracing + evaluation. OpenInference standard for traces; runs locally with one pip install. The OSS pick for teams that want LangSmith-shaped functionality without vendor lock-in. Memory-system observability is where Phoenix earns its place — without auditing, memory becomes confidently wrong.
Apple's Metal-native ML framework's LLM runner. Now competitive with llama.cpp Metal on M-series silicon, with stronger long-context performance. The 2026 unlock here was Thunderbolt 5 + macOS 26.2 RDMA, which made multi-Mac clusters credible — see Exo.
Architecture: Pure Metal kernels; unified-memory-aware. The MLX quant format is separate from GGUF, which is the main compatibility gap.
Hosted temporal-knowledge-graph memory product. Strongest API in the category for hosted scenarios; the OSS core lives but the canonical experience is the cloud product. Pick Zep when cross-machine continuity + multi-hop reasoning matter more than full local control.
AMD's CUDA equivalent. ROCm 6.2+ matured through 2025; the gap with CUDA is narrowing on the headline LLaMA / Mistral / Qwen architectures. RX 7900 XTX on ROCm runs Llama 3.1 8B Q4_K_M at ~86 tok/s — within 17% of RTX 4090. The trajectory matters: AMD viability for local AI improved more in 2025-2026 than in any prior 18-month period.
Architecture: Kernel coverage trails CUDA; some attention variants regress. Verify your model's specific architecture has a working ROCm path before committing.
Going deeper by category
- Coding agents frontier — OpenHands / OpenClaw / Goose / Aider / Cline / Continue.
- Inference frontier — vLLM / SGLang / Ollama / MLX-LM / Exo / Petals.
- Memory frontier — Letta / Mem0 / Zep / Graphiti.
- MCP frontier — Anthropic reference servers + vendor-maintained + community.
- All zones — frontier index with the full counter dashboard.