BLK · DECISION SURFACE

Three questions, three answers.

Local AI gets confusing fast. This page exists so you don't have to read 50 spec sheets before you can act. Pick the question that matches what you're actually asking, get a real answer.

QUESTION

Will it run?

Pick your hardware and a model, get a verdict in seconds. Honest about what fits, what's tight, and what doesn't.

Check fit →

QUESTION

How well will it run?

Every catalog hardware unit ranked by RunLocalAI Score. Measured tok/s where we have it, extrapolated honestly where we don't.

See leaderboard →

QUESTION

What should I buy?

Tell us your budget, OS, target models, and what you care about. We rank GPUs by fit with reasoning — no fake tok/s numbers.

Get recommendation →

STARTER PATHS

Three setups we'd actually recommend.

Not exhaustive. Opinionated. Each path is a real hardware + software bundle we publish — not a marketing wrapper. Pick one, follow the linked stack page, you're running local AI in under an hour.

CHEAPEST

Cheapest

≤ $300 budget

Hardware

Used RTX 3060 12GB · 16GB system RAM

Fits

7B–14B at Q4 · 32B with offload

The cheapest sane way to run useful local AI in 2026. A used 3060 12GB pairs with Ollama + Open WebUI for chat, code completion, and a small RAG setup. You won't run 70B, but Phi-4 14B at Q4 holds its own for daily work.

Follow the 16GB VRAM Local AI stack →

BALANCED

Balanced

≈ $800–$1,200 budget (CUDA path)

Hardware

RTX 4070 Ti SUPER 16GB · 32GB system RAM

Fits

14B comfortably · 32B at Q4 · 70B at IQ2/IQ3

The sweet spot for most readers. Snappy 32B chat, a coding agent that doesn't feel laggy, and headroom for vision models. Apple Silicon is a parallel sweet-spot if portability matters more than ecosystem breadth — see the Apple Silicon stack for that route (M3 Max + 36GB is ~$2k, a different price band).

Follow the 16GB VRAM Local AI stack →

BEST-VALUE

Best-value

≈ $1,400–$2,000 budget

Hardware

Used dual RTX 3090 24GB · 48GB pooled VRAM

Fits

70B Q4 comfortably · long-context 32K+

Highest tok/s-per-dollar configuration we recommend. Two used 3090s pool to 48GB VRAM, run 70B at Q4 with room for a 32K KV cache, and stay within reach of a single 850W PSU. The bandwidth scaling is excellent for batched inference.

Follow the Dual-3090 workstation →