Three questions, three answers.
Local AI gets confusing fast. This page exists so you don't have to read 50 spec sheets before you can act. Pick the question that matches what you're actually asking, get a real answer.
Pick your hardware and a model, get a verdict in seconds. Honest about what fits, what's tight, and what doesn't.
Every catalog hardware unit ranked by RunLocalAI Score. Measured tok/s where we have it, extrapolated honestly where we don't.
Tell us your budget, OS, target models, and what you care about. We rank GPUs by fit with reasoning — no fake tok/s numbers.
Three setups we'd actually recommend.
Not exhaustive. Opinionated. Each path is a real hardware + software bundle we publish — not a marketing wrapper. Pick one, follow the linked stack page, you're running local AI in under an hour.
The cheapest sane way to run useful local AI in 2026. A used 3060 12GB pairs with Ollama + Open WebUI for chat, code completion, and a small RAG setup. You won't run 70B, but Phi-4 14B at Q4 holds its own for daily work.
Follow the 16GB VRAM Local AI stack →The sweet spot for most readers. Snappy 32B chat, a coding agent that doesn't feel laggy, and headroom for vision models. Apple Silicon is a parallel sweet-spot if portability matters more than ecosystem breadth — see the Apple Silicon stack for that route (M3 Max + 36GB is ~$2k, a different price band).
Follow the 16GB VRAM Local AI stack →Highest tok/s-per-dollar configuration we recommend. Two used 3090s pool to 48GB VRAM, run 70B at Q4 with room for a 32K KV cache, and stay within reach of a single 850W PSU. The bandwidth scaling is excellent for batched inference.
Follow the Dual-3090 workstation →Want the math?
Upfront + electricity vs cloud API. Every assumption visible.
Annual landscape report — hardware, models, runtimes, agentic.
AutoGen, CrewAI, LangGraph, Hermes models — the agent ecosystem.
How we estimate tok/s and what counts as "measured."
Head-to-head matchups for common buyer choices.