Stack Builder

Eight inputs — use case, budget, scale, privacy posture — and we compose the full rig: GPU + runtime + 1-3 model picks + first-run workflow + cost rollup + ready-to-paste install script. Three tiers side-by-side so the upgrade path stays visible.

Every recommendation references rule-based scoring; measured tok/s carries a confidence chip when surfaced. We don't invent numbers — when the data isn't there we say so.

Tell us about your build

URL updates as you change fields — share or bookmark a result. Showing the balanced default — change any field to refine.

Primary use case?

Budget?

Scale?

Operating system?

Your skill level?

Privacy posture?

Side-by-side: budget vs balanced vs stretch

one step down · your inputs · one step up

Budget

One step down on budget. What you give up; what you keep.

GPU

NVIDIA GeForce RTX 5090 Mobile· 24 GB

Runtime

Ollama

Top model

Qwen 3 14B· Q4_K_M

3-yr TCO

—

Balanced

~$899

Your inputs, our recommendation. Read the full card below.

GPU

NVIDIA GeForce RTX 3090· 24 GB

Runtime

Ollama

Top model

Qwen 3 14B· Q4_K_M

3-yr TCO

$1,046

Stretch

~$2,500

One step up on budget. What you'd gain; what it costs.

GPU

NVIDIA L4· 24 GB

Runtime

Ollama

Top model

Qwen 3 14B· Q4_K_M

3-yr TCO

$2,530

Your recommended stack

full breakdown — read top to bottom

Balanced — recommended

nvidia24 GB VRAM~$899

NVIDIA GeForce RTX 3090 + Ollama + Qwen 3 14B

§ Hardware

Qwen 2.5 Coder 32B Q4 + 32K context

Expected throughput: 30-60 tok/s on 32B Q4 single-stream; 80-130 tok/s on 13B Q4.

Estimated(rule-based scoring)Full hardware page →

§ Runtime

Ollama

Default pick for most operators: one binary, automatic GPU detection, OpenAI-compatible HTTP API at `:11434`. Sufficient for solo + small-team workloads.

›Install: curl -fsSL https://ollama.com/install.sh | sh
›Pull a model: ollama pull <model>:<tag>
›HTTP API at http://localhost:11434

§ Model picks (2)

Qwen 3 14B
14B params
Q4_K_M
~8.6 GB
Strongest general-purpose model at 14B in 2026. Multilingual tokenizer (1.7× more efficient on Turkish/Asian languages than Llama). Reasoning mode available.
C
Community-reported·30-45 tok/s on 16GB VRAM
Phi-4 14B
14B params
Q4_K_M
~8.5 GB
Microsoft's reasoning-focused 14B trained on heavy synthetic data. Beats Llama 3.1 8B on math/code benchmarks. Weaker creative writing.
Ed
Editorial·30-45 tok/s on 16GB VRAM

§ First-run workflow

✓ Curated stack match

Your inputs match the editorial stack Build a local coding-agent stack (May 2026). That page has the field-tested version of this recipe with concrete commands and a why-not-the-alternative for each pick.

Install Ollama on Linux.
Pull the primary model: ollama pull qwen3-14b
Verify it runs: ollama run qwen3-14b — type a test prompt.
Connect a coding agent: install Cline (VS Code extension) or Aider (CLI) and point it at http://localhost:11434 (Ollama) or :8000 (vLLM).
Verify the full loop end-to-end before adding observability, monitoring, or any second model.

§ Total cost of ownership (3-year)

Upfront

$899

hardware

Monthly electricity

at 0.84 kWh/day

3-year total

$1,046

upfront + electricity

Cloud equivalent

same token volume

Break-even

—

local beats cloud

Tok/s assumed

0.0

unknown

Assumptions: 4 hr/day active, 60% utilization, 3-year amortization, $0.30/M token cloud equivalent. Tune in cost calculator →

Install script

copy and paste — this gets you to first token

#!/usr/bin/env bash
# RunLocalAI stack installer — generated by /stack-builder
# Use case: coding
# Hardware: NVIDIA GeForce RTX 3090
# Runtime: Ollama

set -e

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull the primary model
ollama pull qwen3-14b

# 3. Sanity check
ollama run qwen3-14b "Hello — respond in one short sentence to confirm you're running."

# 4. Verify the HTTP API
curl http://localhost:11434/api/tags

Where to go from here

GPU chooser →

Just the hardware-pick question, with side-by-side compare, price/perf scatter, and score breakdown per dimension.

Custom build engine →

Reverse direction: I have this hardware — what fits? Use this to validate the recommendation against your actual rig.

Quant Advisor →

Drill into the model picks: Q4_K_M vs Q5_K_M vs Q8 on your specific VRAM, with quality curve + VRAM fit visualization.

TCO calculator →

Tune every assumption: utilization, electricity rate, cloud equivalent rate, amortization horizon.

Curated stacks →

18 hand-curated stack recipes for specific outcomes (coding agent, offline RAG, dual-3090, Mac cluster, iPhone, etc.)