RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
← Home

Choose my GPU

Answer nine questions. We rank the GPUs in our catalog by fit for local AI on your stack — top picks, alternates, and what to avoid. Hand-written rationale per card, honest caveats, and a one-click handoff into the custom build engine.

We don’t fake tok/s numbers. Every recommendation cites a model class and a workload-realistic range. Cards over your budget appear last with explicit framing. Recommendations are rule-based scoring, not measured benchmarks.

Tell us about your build

URL updates as you change fields. Showing the balanced default — change any field to refine.

Price vs performance (budget-neutral)
11 cards · 1 skipped (no price)
0255075100$500$1,000$2,000$4,000$8,000Effective price (log)Performance (budget-neutral)your budgetNVIDIA H100 PCIe — $25,000 · AvoidNVIDIA RTX 6000 Ada Generation — $6,499 · AvoidNVIDIA L40S — $8,500 · AvoidNVIDIA GeForce RTX 3090 — $899 · Top pickGeForce RTX 3090NVIDIA GeForce RTX 3090 Ti — $1,199 · Top pickGeForce RTX 3090 TiNVIDIA GeForce RTX 5070 Ti — $849 · Top pickGeForce RTX 5070 TiNVIDIA GeForce RTX 4080 — $1,099 · Top pickGeForce RTX 4080NVIDIA GeForce RTX 4070 Ti Super — $829 · Top pickGeForce RTX 4070 Ti SuperNVIDIA GeForce RTX 4080 Super — $1,099 · Top pickGeForce RTX 4080 SuperNVIDIA GeForce RTX 5060 Ti 16GB — $459 · Top pickGeForce RTX 5060 Ti 16GBNVIDIA GeForce RTX 5080 — $1,199 · Top pickGeForce RTX 5080
Top pick (9)
Avoid (3)
Top picks
9 cards matching your stack tightly
Tick two cards to compare side-by-side
Top pick
nvidia24 GB~$899·Estimated(used-market price)
Operator-grade

NVIDIA GeForce RTX 3090

Top pick for your setup. With your $1,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 3090 ranks here because 24 GB hits the workable band for coding agents — fits at sensible quants without becoming the bottleneck.

Sustained 450W+ — minimum 1000W Gold PSU + good airflow. Your power tolerance is moderate (350W ceiling), which this card will exceed under load.
Realistic model class
Qwen 2.5 Coder 32B Q4 + 32K context
Expected throughput
30-60 tok/s on 32B Q4 single-stream; 80-130 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
1benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
Low
1 cohort
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • Only 1 benchmark — below the 5-row threshold for cohort signal.
Help us measure NVIDIA GeForce RTX 3090 →
Measured throughput
top 1 of 1 on file · most recent first
  • ed
    llama 3.1 8b instructQ4_K_M
    105.0tok/s2026-05
Featured in stacks
  • Dual RTX 3090 workstation stack — 70B-class on $1,800 of used GPUs — Workstation · GPUs (2× 24GB used, the cheapest path to 48 GB total)
  • Quad RTX 3090 workstation stack — the prosumer 100B-class ceiling — Homelab · GPUs (4× 24GB used; the prosumer-ceiling stack)
Show 1 benchmark feeding this card▸
  • ed
    #340llama-3.1-8b-instruct · Q4_K_M
    105.0 tok/s2026-05-13
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%70Good
  • Budget fitweight 18%95Excellent
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%80Strong
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%95Excellent
  • Perf-per-wattweight 6%85Strong

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •Used-market only — fan/thermal-pad inspection required; new MSRP from launch is no longer the relevant price.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 3090
Featured in these stacks
  • Dual RTX 3090 workstation stack — 70B-class on $1,800 of used GPUs — Workstation tier · GPUs (2× 24GB used, the cheapest path to 48 GB total)
  • Quad RTX 3090 workstation stack — the prosumer 100B-class ceiling — Homelab tier · GPUs (4× 24GB used; the prosumer-ceiling stack)
Top pick
nvidia24 GB~$1,199·Estimated(used-market price)
Operator-grade

NVIDIA GeForce RTX 3090 Ti

Top pick for your setup. With your $1,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 3090 Ti ranks here because 24 GB hits the workable band for coding agents — fits at sensible quants without becoming the bottleneck.

Sustained 450W+ — minimum 1000W Gold PSU + good airflow. Your power tolerance is moderate (350W ceiling), which this card will exceed under load.
Realistic model class
Qwen 2.5 Coder 32B Q4 + 32K context
Expected throughput
30-60 tok/s on 32B Q4 single-stream; 80-130 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA GeForce RTX 3090 Ti →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%70Good
  • Budget fitweight 18%95Excellent
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%25Weak
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%95Excellent
  • Perf-per-wattweight 6%65Good

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •Sustained ~450W — plan for a 1000W+ PSU and adequate case airflow.
  • •Used-market only — fan/thermal-pad inspection required; new MSRP from launch is no longer the relevant price.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 3090 Ti
Top pick
nvidia24 GB
Operator-grade

NVIDIA GeForce RTX 5090 Mobile

Top pick for your setup. With your $1,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 5090 Mobile ranks here because 24 GB hits the workable band for coding agents — fits at sensible quants without becoming the bottleneck.

Realistic model class
Qwen 2.5 Coder 32B Q4 + 32K context
Expected throughput
30-60 tok/s on 32B Q4 single-stream; 80-130 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA GeForce RTX 5090 Mobile →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%70Good
  • Budget fitweight 18%50Acceptable
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%95Excellent
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%95Excellent
  • Perf-per-wattweight 6%95Excellent

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 5090 Mobile
Top pick
nvidia16 GB~$849
Operator-grade

NVIDIA GeForce RTX 5070 Ti

Top pick for your setup. With your $1,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 5070 Ti sits in this tier on a balance of capability, OS compat, power, and budget fit.

Realistic model class
Qwen 2.5 Coder 14B FP16, agents OK
Expected throughput
40-70 tok/s on 7B Q4; 20-35 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA GeForce RTX 5070 Ti →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%33Weak
  • Budget fitweight 18%95Excellent
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%80Strong
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%90Excellent
  • Perf-per-wattweight 6%85Strong

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •16 GB is below the comfortable VRAM minimum for coding agents — expect quant downgrades or very tight context windows.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 5070 Ti
Top pick
nvidia16 GB~$1,099
Operator-grade

NVIDIA GeForce RTX 4080

Top pick for your setup. With your $1,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 4080 sits in this tier on a balance of capability, OS compat, power, and budget fit.

Realistic model class
Qwen 2.5 Coder 14B FP16, agents OK
Expected throughput
40-70 tok/s on 7B Q4; 20-35 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA GeForce RTX 4080 →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%33Weak
  • Budget fitweight 18%95Excellent
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%80Strong
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%90Excellent
  • Perf-per-wattweight 6%85Strong

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •16 GB is below the comfortable VRAM minimum for coding agents — expect quant downgrades or very tight context windows.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 4080
Top pick
nvidia16 GB~$829
Operator-grade

NVIDIA GeForce RTX 4070 Ti Super

Top pick for your setup. With your $1,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 4070 Ti Super sits in this tier on a balance of capability, OS compat, power, and budget fit.

Realistic model class
Qwen 2.5 Coder 14B FP16, agents OK
Expected throughput
40-70 tok/s on 7B Q4; 20-35 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA GeForce RTX 4070 Ti Super →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%33Weak
  • Budget fitweight 18%95Excellent
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%80Strong
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%90Excellent
  • Perf-per-wattweight 6%85Strong

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •16 GB is below the comfortable VRAM minimum for coding agents — expect quant downgrades or very tight context windows.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 4070 Ti Super
Top pick
nvidia16 GB~$1,099
Operator-grade

NVIDIA GeForce RTX 4080 Super

Top pick for your setup. With your $1,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 4080 Super sits in this tier on a balance of capability, OS compat, power, and budget fit.

Realistic model class
Qwen 2.5 Coder 14B FP16, agents OK
Expected throughput
40-70 tok/s on 7B Q4; 20-35 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA GeForce RTX 4080 Super →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%33Weak
  • Budget fitweight 18%95Excellent
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%80Strong
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%90Excellent
  • Perf-per-wattweight 6%85Strong

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •16 GB is below the comfortable VRAM minimum for coding agents — expect quant downgrades or very tight context windows.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 4080 Super
Top pick
nvidia16 GB~$459
Operator-grade

NVIDIA GeForce RTX 5060 Ti 16GB

Top pick for your setup. With your $1,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 5060 Ti 16GB sits in this tier on a balance of capability, OS compat, power, and budget fit.

Realistic model class
Qwen 2.5 Coder 14B FP16, agents OK
Expected throughput
40-70 tok/s on 7B Q4; 20-35 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA GeForce RTX 5060 Ti 16GB →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%33Weak
  • Budget fitweight 18%80Strong
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%95Excellent
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%70Good
  • Perf-per-wattweight 6%95Excellent

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •16 GB is below the comfortable VRAM minimum for coding agents — expect quant downgrades or very tight context windows.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 5060 Ti 16GB
Top pick
nvidia16 GB~$1,199
Operator-grade

NVIDIA GeForce RTX 5080

Top pick for your setup. With your $1,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 5080 sits in this tier on a balance of capability, OS compat, power, and budget fit.

Realistic model class
Qwen 2.5 Coder 14B FP16, agents OK
Expected throughput
60-100 tok/s on 7B Q4; 30-50 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
2benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
Low
2 cohorts
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • Only 2 benchmarks — below the 5-row threshold for cohort signal.
Help us measure NVIDIA GeForce RTX 5080 →
Measured throughput
top 2 of 2 on file · most recent first
  • ed
    repro
    llama 3.1 8b instructQ4_K_M
    132.2tok/s2026-05
  • ed
    llama 3.1 8b instructQ4_K_M
    118.2tok/s2026-05
Show 2 benchmarks feeding this card▸
  • ed
    #338llama-3.1-8b-instruct · Q4_K_M
    132.2 tok/s2026-05-11
    repro
  • ed
    #333llama-3.1-8b-instruct · Q4_K_M
    118.2 tok/s2026-05-05
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%33Weak
  • Budget fitweight 18%95Excellent
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%50Acceptable
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%95Excellent
  • Perf-per-wattweight 6%65Good

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •16 GB is below the comfortable VRAM minimum for coding agents — expect quant downgrades or very tight context windows.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 5080
Why we ruled these out
Over-budget or fundamentally incompatible — listed for the upgrade-path conversation
  • NVIDIA H100 PCIe — Out of budget for this query.
    ~$25,000
  • NVIDIA RTX 6000 Ada Generation — Out of budget for this query.
    ~$6,499
  • NVIDIA L40S — Out of budget for this query.
    ~$8,500

Where to go from here

Stack Builder →

One step further: this card + runtime + 1-3 models + cost rollup + ready-to-paste install script. Eight inputs → full rig.

Custom build engine →

Once you’ve picked a card, model the full build (CPU, RAM, runtime) for which models fit comfortably.

GPU buying guide 2026 →

The long-form essay version: VRAM tiers, MoE math, NVLink truth, used-market price discipline.

Hardware combinations →

Curated multi-GPU and Apple-cluster setups with effective-VRAM math you can trust.

Scoring methodology →

How the trust layer behind these recommendations actually works — every dimension, every formula, the honest limits.

Cohort coverage report →

Where the intelligence graph has signal vs which model × hardware × quant cohorts are still underpowered.

Reproduce a benchmark →

Help tip a cohort across the 5-row threshold for outlier detection — the most operator-impactful contribution.