The local AI agent ecosystem
Six zones covering the surfaces a developer touches when building or deploying agents that run partly or wholly on local hardware. Catalog entries are linked from each card; deeper architecture references are in /systems.
Coding agents
Tools that take a problem statement and produce code changes — branches, edits, PRs. The 2026 lineup splits into closed leaders (Claude Code, Cursor, GitHub Copilot) and open challengers (OpenHands, Aider, Cline, Goose). Local-LLM support varies sharply.
Claude Code
Anthropic's terminal-native coding agent. Tops SWE-bench Verified at 87.6% and SWE-bench Pro at 64.3% in 2026. Deep MCP integration, agentic file editing, and a $20/mo Pro tier are
Cursor
Anysphere's AI-native IDE. Forks VS Code with Cursor Tab inline completion, agentic chat, and background agents. Best 'flow' for inline completion in 2026.
OpenHands
AI-driven development agent that completes engineering tasks end-to-end — branches, code, PRs. v1.6 added a Planning Mode that drafts a plan before executing. Local-LLM-friendly vi
Aider
Terminal-based AI pair programmer. Run in your project directory, describe a change, it edits files and creates meaningful git commits. Works with any LLM — local Ollama, Anthropic
Cline
VS Code extension agent — ~4M installs in 2026. Plan/Act mode, autonomous file edits with diff approval, terminal access. The leading open-source IDE agent.
Continue
Open-source VS Code and JetBrains assistant. Configurable autocomplete + chat + agent modes. Strong with local Ollama backends.
Goose
Open-source extensible AI agent now governed by the Agentic AI Foundation (AAIF) at the Linux Foundation. Started inside Block (formerly Square). 25+ provider support including Oll
Roo Code (sunsetting May 15, 2026)
Open-source AI dev-team extension for VS Code (1.55M installs, 23.8k GitHub stars). **Discontinued: all Roo Code products — Extension, Cloud, and Router — shut down on May 15, 2026
Personal AI agents
The non-coding side: assistants that connect models to messaging surfaces, productivity apps, and long-running task workflows. OpenClaw is the runaway 2026 release here.
OpenClaw
Personal AI agent with a local-first gateway architecture. Connects your local LLMs (Ollama, llama.cpp) to the messaging surfaces you already use — WhatsApp, Telegram, Slack, Disco
Claude Desktop
Anthropic's official desktop app for Claude. Native MCP server support means you can plug in local file access, GitHub, and custom tools. Distinct from the Claude Code CLI.
AnythingLLM
Document-oriented LLM frontend with workspaces. Connects to Ollama, LM Studio, OpenAI, Anthropic, etc. Strong document RAG.
Memory frameworks
Agents that remember across sessions need a memory layer. The 2026 split is between drop-in APIs (Mem0), OS-style explicit management (Letta), and graph-based reasoning (Mem0g, Zep / Graphiti).
Mem0 (agent memory API)
Drop-in memory layer for LLM agents. Vector + graph memory variants (Mem0g) — the graph variant builds a directed labeled knowledge graph alongside the vector store, with conflict
Letta (memory framework)
Agent memory framework that models memory like an operating system. Main context = RAM, archival storage = disk; the agent itself decides when to page. Originally MemGPT, now Letta
MCP protocol layer
The open standard that ties LLM clients to external tools. 500+ public servers. Dive into the protocol details before deploying — see our MCP system guide for architecture, lifecycle, and security.
Local inference runtimes
The runtime that hosts the model weights. llama.cpp / Ollama for accessibility, vLLM for throughput, MLX on Apple, ExLlamaV2 for ExLlama-quant speed. The choice you make here constrains which agents and memory frameworks pair cleanly.
Ollama
The default first-pull tool for local AI. One-line model installs (`ollama run llama3.1`), an OpenAI-compatible HTTP API, good defaults out of the box. Built on llama.cpp.
llama.cpp
The bedrock of local LLM inference. Most other tools wrap or embed it. Maximum control, maximum platform support, sharpest learning curve.
vLLM
High-throughput serving engine. PagedAttention, continuous batching, prefix caching. Production default for self-hosted LLM APIs at scale.
LM Studio
Polished desktop GUI for local LLMs. Built-in HuggingFace search, OpenAI-compatible local server, side-by-side conversations.
Distributed + P2P inference
The newest zone. Hyperspace pioneered consumer-device P2P inference; vLLM remains the production multi-node standard. Watch this category — it's where the next moat shifts.
How this map updates
This page reads its zones live from the catalog. When a new tool ships and lands in our scripts/seed/agents.ts or scripts/seed/tools.ts file, it shows up here automatically. The editorial framing — zone titles, blurbs, "what changed" — is hand-written and refreshed on the first business day of each month. If the ecosystem shifts mid-cycle (a major release, a deprecation, a new zone emerging), we update sooner.
Going deeper
- What MCP is really solving — protocol-engineering depth on the integration layer that ties this whole map together.
- Will-it-run methodology — the math behind our hardware compatibility predictions, including confidence grading.
- Benchmark dataset — measured tokens-per-second across the runtimes mapped above.