Custom build engine

Describe your build — any GPUs, CPU, RAM, OS, runtime, use case. We'll compute effective VRAM honestly, recommend a runtime, and tell you which models fit comfortably, which are borderline, and which aren't practical.

Total VRAM ≠ pooled VRAM. We never sum VRAM unless the silicon truly pools (Apple unified memory). We always explain why effective is lower than total.

Start from a curated preset

Each preset reflects a build a real operator would actually own. Click to load — then tweak any field below. Don’t know which GPU yet? Run the GPU chooser first to narrow the field by budget, OS, and workload.

Budget 16 GB build

beginner

A used RTX 4060 Ti 16 GB or RTX 4070 Ti Super, modest CPU, 32 GB system RAM, Linux. The cheapest realistic local-AI starter that runs 13B-cl…

Used dual 3090 build

homelab

The canonical 70B-on-a-budget setup. Two used RTX 3090 cards (24 GB each), modern high-end CPU, 128 GB RAM, Ubuntu 24.04, vLLM or ExLlamaV2.…

Single RTX 4090 workstation

workstation

The workstation default for serious solo-user local AI. 24 GB VRAM, 64 GB system RAM, modern high-end CPU, NVMe storage. Runs 32B-class mode…

Single RTX 5090 workstation

workstation

32 GB VRAM, 1.79 TB/s memory bandwidth, native FP4 acceleration. The 2026 next-gen consumer flagship. Comfortably runs 32B-class FP16 and pu…

Apple M4 Max laptop build

apple

MacBook Pro M4 Max with 64-128 GB unified memory, MLX-LM as the engine. Battery-aware single-machine inference for 32B-class models with no …

Mac Studio M3 Ultra (192 GB)

apple

The only realistic single-machine path to 70B FP16 outside a datacenter. 192 GB unified memory, near-silent operation, MLX-LM as the canonic…

Linux AMD RX 7900 XTX

amd

Ubuntu 24.04 + ROCm 6.x + RX 7900 XTX (24 GB). The cheapest 24 GB VRAM AMD path; pairs with llama.cpp HIPBLAS for the most reliable AMD infe…

Windows beginner build

beginner

RTX 4070 Ti Super + Windows 11 + LM Studio. The smoothest possible introduction to local AI on Windows — no compilation, no driver wrestling…

Local coding agent build

production

RTX 4090 + 64 GB RAM + Ubuntu 24.04 + vLLM serving Qwen 2.5 Coder 32B AWQ-INT4 at 32K context. The reference autonomous-coding-agent setup.

Offline RAG workstation

workstation

Single-user document-search and Q&A on a 4090. Qwen 2.5 14B + nomic-embed-text + Qdrant in Docker. Fits documents of arbitrary size; long-co…

Mixed 4090 + 3090 experiment

experiment

An asymmetric build for VRAM-rich experimentation. llama.cpp layer-split with --tensor-split distributes by VRAM ratio. Not a clean producti…

Homelab quad 3090 serving

homelab

Four used RTX 3090s on a server motherboard, Ubuntu 24.04 + vLLM. 96 GB aggregate VRAM with tensor-parallel for 70B AWQ + concurrent users, …

Describe your build

Add GPUs, set CPU/RAM/OS, optionally pick a runtime + use case. URL updates as you change fields — share a build by copying the URL.

GPUs in your build

No GPU slots — pick one below or add multiple slots for mixed-GPU builds. Leave empty for CPU-only inference.

CPU class

System RAM (GB)

Use case

Your skill level

Runtime preference