RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
← Back to Will-it-run

Custom build engine

Describe your build — any GPUs, CPU, RAM, OS, runtime, use case. We'll compute effective VRAM honestly, recommend a runtime, and tell you which models fit comfortably, which are borderline, and which aren't practical.

Total VRAM ≠ pooled VRAM. We never sum VRAM unless the silicon truly pools (Apple unified memory). We always explain why effective is lower than total.

Start from a curated preset

Each preset reflects a build a real operator would actually own. Click to load — then tweak any field below. Don’t know which GPU yet? Run the GPU chooser first to narrow the field by budget, OS, and workload.

Budget 16 GB build
beginner

A used RTX 4060 Ti 16 GB or RTX 4070 Ti Super, modest CPU, 32 GB system RAM, Linux. The cheapest realistic local-AI starter that runs 13B-cl…

Used dual 3090 build
homelab

The canonical 70B-on-a-budget setup. Two used RTX 3090 cards (24 GB each), modern high-end CPU, 128 GB RAM, Ubuntu 24.04, vLLM or ExLlamaV2.…

Single RTX 4090 workstation
workstation

The workstation default for serious solo-user local AI. 24 GB VRAM, 64 GB system RAM, modern high-end CPU, NVMe storage. Runs 32B-class mode…

Single RTX 5090 workstation
workstation

32 GB VRAM, 1.79 TB/s memory bandwidth, native FP4 acceleration. The 2026 next-gen consumer flagship. Comfortably runs 32B-class FP16 and pu…

Apple M4 Max laptop build
apple

MacBook Pro M4 Max with 64-128 GB unified memory, MLX-LM as the engine. Battery-aware single-machine inference for 32B-class models with no …

Mac Studio M3 Ultra (192 GB)
apple

The only realistic single-machine path to 70B FP16 outside a datacenter. 192 GB unified memory, near-silent operation, MLX-LM as the canonic…

Linux AMD RX 7900 XTX
amd

Ubuntu 24.04 + ROCm 6.x + RX 7900 XTX (24 GB). The cheapest 24 GB VRAM AMD path; pairs with llama.cpp HIPBLAS for the most reliable AMD infe…

Windows beginner build
beginner

RTX 4070 Ti Super + Windows 11 + LM Studio. The smoothest possible introduction to local AI on Windows — no compilation, no driver wrestling…

Local coding agent build
production

RTX 4090 + 64 GB RAM + Ubuntu 24.04 + vLLM serving Qwen 2.5 Coder 32B AWQ-INT4 at 32K context. The reference autonomous-coding-agent setup.

Offline RAG workstation
workstation

Single-user document-search and Q&A on a 4090. Qwen 2.5 14B + nomic-embed-text + Qdrant in Docker. Fits documents of arbitrary size; long-co…

Mixed 4090 + 3090 experiment
experiment

An asymmetric build for VRAM-rich experimentation. llama.cpp layer-split with --tensor-split distributes by VRAM ratio. Not a clean producti…

Homelab quad 3090 serving
homelab

Four used RTX 3090s on a server motherboard, Ubuntu 24.04 + vLLM. 96 GB aggregate VRAM with tensor-parallel for 70B AWQ + concurrent users, …

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

No GPU slots — pick one below or add multiple slots for mixed-GPU builds. Leave empty for CPU-only inference.