Frontier · Trending this week

What's trending in local AI right now

The repos adding stars fastest in the last 30 days, across all ecosystem categories. Star velocity alone is noise — what matters is which repos are catching attention because they represent real architectural shifts. Each entry below carries the editorial significance that separates “hype-cycle spike” from “ecosystem-relevance signal.”

ExplodingCoding agent
350k+45k/30d
OpenClaw

Crossed 350k GitHub stars in early April 2026 — fastest community-growth curve ever recorded for an open-source agent product. Founder joined OpenAI; project transitioned to foundation governance. The defining signal that open-source autonomous coding agents have caught up to (and on some workloads, surpassed) closed-source flagships.

Architecture: Anthropic-style reasoning loop (thinking → planning → execution decomposition) instead of ReAct-style tool-call loops. MCP-first dispatcher treats built-ins as MCP servers internally. The architectural break that explains the velocity advantage on complex tasks.

ExplodingFrontier model
95k+12k/30d
DeepSeek R1

DeepSeek's reasoning model — explicit chain-of-thought emission as the architectural primitive. The R1 Distill Qwen 32B variant captures ~80% of the full R1's reasoning at 5% of the VRAM, fits an RTX 4090 in AWQ-INT4. The reason local-reasoning workloads became viable in 2025-2026 at all.

ExplodingMCP server
113k+9k/30d
Firecrawl

Crossed 113k stars by May 2026 — fastest-growing managed-crawler MCP. Handles JS rendering, anti-bot evasion, and large-site map+scrape jobs at scale. The pragmatic upgrade from mcp-server-fetch when an agent needs to crawl thousands of pages or work against JS-heavy SPAs.

Architecture: Cloud-managed rendering pipeline + thin OSS MCP client. The architectural tradeoff: outsource rendering complexity for crawl-volume scenarios.

RisingCoding agent
145k+8k/30d
OpenHands

The longest-track-record open-source autonomous coding agent. Planning Mode in v1.6+ closed the 'agent loops without making progress' gap that plagued earlier products. Currently the most stable production deployment in the category — pick OpenHands when reliability matters more than velocity (compare to OpenClaw).

Architecture: ReAct-style loops, MCP-first tool dispatcher, sandbox executor with Docker / chroot / native modes. The /stacks/local-coding-agent canonical recipe is built around OpenHands.

ExplodingMCP server
65k+7k/30d
modelcontextprotocol/servers

Anthropic's reference MCP server collection. Filesystem, Git, Memory, Postgres, Brave Search, Fetch, Sequential Thinking. The canonical implementations every third-party MCP server gets compared to. Anthropic-maintained; 97 million installs across the ecosystem (April 2026).

Architecture: Each server is a separate stdio process; the protocol is JSON-RPC 2.0 with explicit lifecycle + capability negotiation.

ExplodingFrontier model
28k+6k/30d
Qwen 3

The Qwen team's reasoning-toggle generation. Native <think> reasoning blocks; toggle per-query. Qwen 3 32B AWQ-INT4 at 36.5 tok/s on RTX 4090 makes serious reasoning workloads viable on consumer hardware. The architectural shift: reasoning quality is now a configurable parameter, not a model choice.

Architecture: Toggle-style reasoning means you don't pay the reasoning-token tax on simple queries. The right pick when workload mix is mostly chat with occasional reasoning needs.

ExplodingFrontend
82k+5k/30d
Open WebUI

Default self-hosted ChatGPT-style frontend in 2026. Provider-agnostic by design (Ollama, vLLM, LM Studio, Anthropic, OpenAI in one model switcher). The architectural shift from 2025: Pipelines feature for multi-modal extension surface. The team-friendly auth + RBAC + admin dashboard make it the replacement-of-choice for paid ChatGPT subscriptions at 5-50 user team scale.

ExplodingMCP server
32k+5k/30d
Playwright MCP

Microsoft's MCP server that drives a real browser via Playwright (Chromium / Firefox / WebKit). Ships ~22 tools that operate against the page's accessibility tree rather than pixel coordinates — the architectural break that made web automation reliable for agent workflows. The default web automation MCP for any agent that needs DOM and follows real navigation.

Architecture: Accessibility-tree targeting beats coordinate clicking by 5-10x reliability. Headed + headless modes; per-session browser process management.

ExplodingDistributed inference
30k+5k/30d
Exo

The 2026 breakthrough release for consumer-cluster inference. Thunderbolt 5 + macOS 26.2 RDMA cut inter-Mac latency by ~99% on M4 Pro+ hardware. DeepSeek V3 671B running at 5.37 tok/s on 8x M4 Pro Mac Minis is now a credible personal-cluster benchmark, not a tech demo. The architectural shift this represents: consumer hardware can now run frontier-class models locally.

Architecture: Pipeline parallel via MLX over Thunderbolt 5 RDMA. Auto-discovery of nearby Apple Silicon devices. The first credible WAN-or-LAN-cluster inference solution where consumer Mac hardware genuinely competes with datacenter SKUs on tokens-per-watt.

RisingCoding agent
38k+5k/30d
Cline

VS Code-native autonomous agent. Picked up Roo Code's user base after that project shut down in May 2026. Stronger autonomous-task capability than Continue; lighter than OpenHands. The current frontrunner in the IDE-resident agent category.

Architecture: Lives inside VS Code rather than in a separate window — different UX paradigm from OpenHands. MCP support; talks to any OpenAI-compatible runtime.

ExplodingMemory system
28k+4k/30d
Mem0

The drop-in agent-memory default. Mem0g (graph variant) took the multi-hop agent-memory benchmark lead at 68.4% LLM Score in April 2026 — the flat-vector vs graph-memory architectural debate now has clear empirical evidence on the graph side for multi-hop tasks.

Architecture: Vector retrieval + implicit consolidation; Mem0g variant adds graph reasoning. 20-line config integration; the friction-minimum memory framework.

RisingCoding agent
28k+4k/30d
Goose

Block's MCP-first agent. Differentiates by treating MCP as the primary extension surface (not as one of many). Strong support for both stdio and remote MCP; the right pick when MCP-heaviness is core to your workflow rather than incidental.

Architecture: Designed as an extension platform — built-in tools are minimal; everything substantial wires in via MCP. Lighter-weight than OpenHands; stronger MCP integration than OpenClaw.

ExplodingInference runtime
52k+3k/30d
vLLM

Production-default inference engine. v0.17.1 (March 2026) shipped Model Runner V2 with up to 56% higher throughput on GB200. PagedAttention turned KV-cache efficiency into a 5-24x throughput delta over baselines; the project's discipline through 2024-2026 turned that single innovation into a complete production stack.

Architecture: PagedAttention + continuous batching + prefix caching + chunked prefill. The OpenAI-compatible API on top makes it a drop-in for any team running an OpenAI bill they'd rather not pay.

RisingMCP server
14k+3k/30d
GitHub MCP Server

GitHub's first-party MCP server. Replaced the original Anthropic reference port in late 2025. Broader API coverage, OAuth-ready transport, release cadence tied to GitHub itself rather than community velocity. The signal: MCP is now the canonical machine-readable interface to GitHub for agent workflows.

Architecture: Direct GraphQL/REST passthrough; OAuth-aware. Issues, PRs, code search, Actions, discussions all surface as MCP tools.

RisingMemory system
17k+2k/30d
Letta

OS-style explicit agent memory. v0.7 (April 2026) shipped genuinely usable explicit-memory hierarchy — the architectural opposite of Mem0's implicit consolidation. Pick Letta when deterministic memory state matters and the agent needs to reason about its own memory.

Architecture: Working memory + archival memory + explicit paging. The agent itself decides when to archive, compress, evict — different abstraction from vector-similarity-only memory frameworks.

ExplodingGraph memory
9k+2k/30d
Graphiti

Reached 1.0 in early 2026 with stable Neo4j integration and a polished agent-memory API. The OSS counterpart to Zep is now production-ready for teams that want full local control over graph memory without the hosted-service dependency.

Architecture: Temporal knowledge graph over Neo4j. Multi-hop reasoning over consolidated agent memory; the right pick when 'what did Bob decide three sessions ago and why' is the shape of question.

RisingFrontend
38k+2k/30d
AnythingLLM

RAG-first workspace tool. Workspace = collection isolation primitive; native ingestion pipeline; LanceDB embedded by default. The right pick when document-first workflows matter more than chat-first (Open WebUI). MCP integration in 2025-2026 turned it from RAG-frontend into agent-front-door.

RisingInference runtime
14k+2k/30d
SGLang

The credible architectural alternative to vLLM. RadixAttention's tree-structured KV cache is a real advantage on shared-prefix traffic; the SGL DSL's structured-generation primitives turn 5-10x token efficiency into a defensible feature for any workload that already enforces output structure client-side.

Architecture: Tree-structured KV cache (vs vLLM's flat blocks) + structured-generation DSL. Cross-replica prefix-cache sync makes the architectural advantage compound at multi-node scale.

RisingVector database
24k+2k/30d
Qdrant

Production single-node vector DB. Best ops surface in the category; PQ quantization makes it the right pick for storage-constrained deployments. Standard upgrade path from LanceDB when single-workspace scale crosses ~500K vectors.

RisingVector database
10k+1k/30d
LanceDB

The embedded-first vector store. Single-folder Arrow files; no server process to firewall. Default for offline / single-process deployments; scales further than Chroma before needing a server. The right vector backend for the /stacks/offline-rag-workstation recipe.

RisingEvaluation tool
8k+700/30d
Phoenix (Arize AI)

OSS-first LLM tracing + evaluation. OpenInference standard for traces; runs locally with one pip install. The OSS pick for teams that want LangSmith-shaped functionality without vendor lock-in. Memory-system observability is where Phoenix earns its place — without auditing, memory becomes confidently wrong.

RisingApple Silicon
5k+600/30d
MLX-LM

Apple's Metal-native ML framework's LLM runner. Now competitive with llama.cpp Metal on M-series silicon, with stronger long-context performance. The 2026 unlock here was Thunderbolt 5 + macOS 26.2 RDMA, which made multi-Mac clusters credible — see Exo.

Architecture: Pure Metal kernels; unified-memory-aware. The MLX quant format is separate from GGUF, which is the main compatibility gap.

RisingMemory system
4k+380/30d
Zep

Hosted temporal-knowledge-graph memory product. Strongest API in the category for hosted scenarios; the OSS core lives but the canonical experience is the cloud product. Pick Zep when cross-machine continuity + multi-hop reasoning matter more than full local control.

RisingROCm tooling
6k+350/30d
ROCm

AMD's CUDA equivalent. ROCm 6.2+ matured through 2025; the gap with CUDA is narrowing on the headline LLaMA / Mistral / Qwen architectures. RX 7900 XTX on ROCm runs Llama 3.1 8B Q4_K_M at ~86 tok/s — within 17% of RTX 4090. The trajectory matters: AMD viability for local AI improved more in 2025-2026 than in any prior 18-month period.

Architecture: Kernel coverage trails CUDA; some attention variants regress. Verify your model's specific architecture has a working ROCm path before committing.

Going deeper by category