Every GPU ranked for local AI inference
One screen. Every catalog GPU sorted by tier, with estimated tok/s for the four canonical model sizes (7B, 14B, 32B, 70B at Q4_K_M). Measurements where we have them, bandwidth-derived estimates where we don't — every cell labeled so you know what you're reading. Methodology at /methodology.
| Tier | GPU | VRAM | BW (GB/s) | Price | 7B Q4 | 14B Q4 | 32B Q4 | 70B Q4 | Rating |
|---|---|---|---|---|---|---|---|---|---|
| A | Apple Mac Studio (M3 Ultra) apple · 2025 | 0 GB | ? | $4,999 | ? | ? | ? | ? | 10.0 |
| A | MacBook Pro 16" M4 Max apple · 2024 | 0 GB | ? | $3,999 | ? | ? | ? | ? | 10.0 |
| A | NVIDIA GeForce RTX 5090 nvidia · 2025 | 32 GB | 1792 | $2,499 | 195● | 105–148 | 46–64 | — | 9.6 |
| A | NVIDIA GeForce RTX 4090 nvidia · 2022 | 24 GB | 1008 | $1,899 | 150● | 59–83 | 37● | 8● | 8.8 |
| B | Lenovo Legion 5 Pro Gen 7 (RTX 3080 16GB) nvidia · 2022 | 16 GB | ? | $1,499 | ? | ? | ? | ? | 9.3 |
| B | NVIDIA GeForce RTX 3090 Ti nvidia · 2022 | 24 GB | ? | $1,199 | ? | ? | ? | ? | 8.8 |
| B | AMD Radeon RX 7900 XTX amd · 2022 | 24 GB | 960 | $899 | 86● | 56–79 | 25–34 | — | 8.6 |
| B | NVIDIA GeForce RTX 3090 nvidia · 2020 | 24 GB | ? | $899 | 105● | ? | ? | ? | 8.5 |
| B | NVIDIA GeForce RTX 5080 nvidia · 2025 | 16 GB | 960 | $1,199 | 132● | 56–79 | — | — | 8.1 |
| B | NVIDIA GeForce RTX 4080 Super nvidia · 2024 | 16 GB | 736 | $1,099 | 82–114 | 43–61 | — | — | 8.1 |
| B | NVIDIA GeForce RTX 4070 Ti Super nvidia · 2024 | 16 GB | ? | $829 | ? | ? | ? | ? | 8.1 |
| B | NVIDIA GeForce RTX 5070 Ti nvidia · 2025 | 16 GB | ? | $849 | ? | ? | ? | ? | 8.1 |
| B | NVIDIA GeForce RTX 4080 nvidia · 2022 | 16 GB | ? | $1,099 | ? | ? | ? | ? | 7.8 |
| B | NVIDIA GeForce RTX 3080 Ti nvidia · 2021 | 12 GB | 912 | $480 | 101–142 | 54–75 | — | — | 7.3 |
| C | AMD Radeon RX 7900 XT amd · 2022 | 20 GB | ? | $729 | ? | ? | ? | ? | 8.1 |
| C | NVIDIA GeForce RTX 5060 Ti 16GB nvidia · 2025 | 16 GB | ? | $459 | ? | ? | ? | ? | 8.1 |
| C | AMD Radeon RX 7900 GRE amd · 2024 | 16 GB | 576 | $549 | 64–90 | 34–47 | — | — | 7.9 |
| C | AMD Radeon RX 9070 amd · 2025 | 16 GB | ? | $569 | ? | ? | ? | ? | 7.9 |
| C | AMD Radeon RX 9070 XT amd · 2025 | 16 GB | ? | $649 | ? | ? | ? | ? | 7.9 |
| C | NVIDIA GeForce RTX 4060 Ti 16GB nvidia · 2023 | 16 GB | ? | $449 | ? | 28● | ? | ? | 7.8 |
| C | AMD Radeon RX 6950 XT amd · 2022 | 16 GB | 576 | $580 | 64–90 | 34–47 | — | — | 7.6 |
| C | AMD Radeon RX 7800 XT amd · 2023 | 16 GB | ? | $459 | ? | ? | ? | ? | 7.6 |
| C | NVIDIA GeForce RTX 4070 Super nvidia · 2024 | 12 GB | ? | $619 | ? | ? | ? | ? | 7.6 |
| C | NVIDIA GeForce RTX 5070 nvidia · 2025 | 12 GB | ? | $599 | ? | ? | ? | ? | 7.6 |
| C | AMD Radeon RX 6800 amd · 2020 | 16 GB | 512 | $380 | 57–80 | 30–42 | — | — | 7.3 |
| C | AMD Radeon RX 6800 XT amd · 2020 | 16 GB | 512 | $450 | 57–80 | 30–42 | — | — | 7.3 |
| C | AMD Radeon RX 6900 XT amd · 2020 | 16 GB | 512 | $500 | 57–80 | 30–42 | — | — | 7.3 |
| C | NVIDIA GeForce RTX 3080 12GB nvidia · 2022 | 12 GB | ? | $449 | ? | ? | ? | ? | 7.3 |
| C | NVIDIA GeForce RTX 4070 nvidia · 2023 | 12 GB | ? | $549 | ? | ? | ? | ? | 7.3 |
| C | NVIDIA GeForce RTX 4070 Ti nvidia · 2023 | 12 GB | ? | $749 | ? | ? | ? | ? | 7.3 |
| C | NVIDIA GeForce RTX 2080 Ti nvidia · 2018 | 11 GB | 616 | $380 | 68–96 | 36–51 | — | — | 6.6 |
| C | NVIDIA GeForce RTX 3070 Ti nvidia · 2021 | 8 GB | 608 | $350 | 68–95 | — | — | — | 5.0 |
| D | AMD Radeon RX 7600 XT amd · 2024 | 16 GB | ? | $309 | ? | ? | ? | ? | 7.9 |
| D | AMD Radeon RX 6750 XT amd · 2022 | 12 GB | 432 | $320 | 48–67 | 25–36 | — | — | 7.1 |
| D | AMD Radeon RX 7700 XT amd · 2023 | 12 GB | ? | $379 | ? | ? | ? | ? | 7.1 |
| D | NVIDIA GeForce RTX 3060 12GB nvidia · 2021 | 12 GB | 360 | $249 | 40–56 | 21–30 | — | — | 7.0 |
| D | AMD Radeon RX 6700 XT amd · 2021 | 12 GB | 384 | $280 | 43–60 | 23–32 | — | — | 6.8 |
| D | NVIDIA GeForce GTX 1080 Ti nvidia · 2017 | 11 GB | 484 | $250 | 54–75 | 28–40 | — | — | 6.6 |
| D | Intel Arc A770 16GB intel · 2022 | 16 GB | ? | $269 | ? | ? | ? | ? | 6.5 |
| D | NVIDIA GeForce RTX 3080 10GB nvidia · 2020 | 10 GB | ? | $379 | ? | ? | ? | ? | 6.5 |
| D | Intel Arc B580 intel · 2024 | 12 GB | ? | $269 | ? | ? | ? | ? | 6.3 |
| D | Intel Arc B570 intel · 2025 | 10 GB | ? | $219 | ? | ? | ? | ? | 5.8 |
| D | NVIDIA GeForce RTX 5060 nvidia · 2025 | 8 GB | ? | $299 | ? | ? | ? | ? | 5.6 |
| D | NVIDIA GeForce RTX 5060 Ti 8GB nvidia · 2025 | 8 GB | ? | $379 | ? | ? | ? | ? | 5.6 |
| D | NVIDIA GeForce RTX 3050 nvidia · 2022 | 8 GB | 224 | $200 | 25–35 | — | — | — | 5.3 |
| D | NVIDIA GeForce RTX 4060 nvidia · 2023 | 8 GB | ? | $279 | ? | ? | ? | ? | 5.3 |
| D | NVIDIA GeForce RTX 4060 Ti 8GB nvidia · 2023 | 8 GB | ? | $369 | ? | ? | ? | ? | 5.3 |
| D | NVIDIA GeForce RTX 2080 Super nvidia · 2019 | 8 GB | 496 | $320 | 55–77 | — | — | — | 5.1 |
| D | NVIDIA GeForce RTX 2070 nvidia · 2018 | 8 GB | 448 | $240 | 50–70 | — | — | — | 5.1 |
| D | AMD Radeon RX 6650 XT amd · 2022 | 8 GB | 280 | $230 | 31–44 | — | — | — | 5.1 |
| D | NVIDIA GeForce GTX 1070 Ti nvidia · 2017 | 8 GB | 256 | $160 | 28–40 | — | — | — | 5.1 |
| D | NVIDIA GeForce RTX 3060 Ti nvidia · 2020 | 8 GB | 448 | $280 | 50–70 | — | — | — | 5.0 |
| D | NVIDIA GeForce RTX 3070 nvidia · 2020 | 8 GB | ? | $269 | ? | ? | ? | ? | 5.0 |
| D | NVIDIA GeForce RTX 2060 Super nvidia · 2019 | 8 GB | 448 | $220 | 50–70 | — | — | — | 4.8 |
| D | NVIDIA GeForce RTX 2070 Super nvidia · 2019 | 8 GB | 448 | $280 | 50–70 | — | — | — | 4.8 |
| D | AMD Radeon RX 6600 XT amd · 2021 | 8 GB | 256 | $200 | 28–40 | — | — | — | 4.8 |
| D | AMD Radeon RX 6600 amd · 2021 | 8 GB | 224 | $180 | 25–35 | — | — | — | 4.8 |
| D | NVIDIA GeForce GTX 1080 nvidia · 2016 | 8 GB | 320 | $180 | 36–50 | — | — | — | 4.6 |
| D | NVIDIA GeForce GTX 1070 nvidia · 2016 | 8 GB | 256 | $140 | 28–40 | — | — | — | 4.6 |
| D | AMD Radeon RX 580 8GB amd · 2017 | 8 GB | 256 | $80 | 28–40 | — | — | — | 3.8 |
| D | AMD Radeon RX 5700 XT amd · 2019 | 8 GB | 448 | $200 | 50–70 | — | — | — | 3.5 |
| D | AMD Radeon RX 5500 XT 8GB amd · 2019 | 8 GB | 224 | $110 | 25–35 | — | — | — | 3.5 |
| D | NVIDIA GeForce GTX 1660 Super nvidia · 2019 | 6 GB | 336 | $150 | 37–52 | — | — | — | 2.8 |
| D | NVIDIA GeForce RTX 2060 nvidia · 2019 | 6 GB | 336 | $180 | 37–52 | — | — | — | 2.8 |
| D | NVIDIA GeForce GTX 1660 Ti nvidia · 2019 | 6 GB | 288 | $160 | 32–45 | — | — | — | 2.8 |
| D | NVIDIA GeForce GTX 1660 nvidia · 2019 | 6 GB | 192 | $130 | 21–30 | — | — | — | 2.8 |
| D | NVIDIA GeForce GTX 1060 6GB nvidia · 2016 | 6 GB | 192 | $110 | 21–30 | — | — | — | 2.6 |
| D | AMD Radeon 880M (Strix Point iGPU) amd · 2024 | 0 GB | 102 | ? | ? | ? | ? | ? | 2.4 |
| D | AMD Radeon 780M (Phoenix iGPU) amd · 2023 | 0 GB | 89 | ? | ? | ? | ? | ? | 2.1 |
| D | NVIDIA GeForce GTX 1650 Super nvidia · 2019 | 4 GB | 192 | $140 | — | — | — | — | 1.8 |
| D | NVIDIA GeForce GTX 1650 nvidia · 2019 | 4 GB | 128 | $130 | — | — | — | — | 1.8 |
| D | AMD Radeon RX 5600 XT amd · 2020 | 6 GB | 336 | $140 | 37–52 | — | — | — | 1.7 |
| D | NVIDIA GeForce GTX 1050 Ti nvidia · 2016 | 4 GB | 112 | $90 | — | — | — | — | 1.3 |
| D | NVIDIA GeForce GTX 1060 3GB nvidia · 2016 | 3 GB | 192 | $70 | — | — | — | — | 1.1 |
| D | AMD Radeon RX 570 amd · 2017 | 4 GB | 224 | $60 | — | — | — | — | 1.0 |
| M | ASUS ROG Strix Scar 18 (RTX 5090 Mobile) nvidia · 2025 | 24 GB | ? | $3,999 | ? | ? | ? | ? | 9.6 |
| M | Razer Blade 16 (2025, RTX 5090 Mobile) nvidia · 2025 | 24 GB | ? | $4,499 | ? | ? | ? | ? | 9.6 |
| M | Framework Laptop 16 (RX 7700S) amd · 2024 | 8 GB | ? | $1,699 | ? | ? | ? | ? | 8.9 |
| M | NVIDIA GeForce RTX 3080 16GB (Mobile) nvidia · 2022 | 16 GB | 512 | ? | 79● | 30–42 | — | — | 8.8 |
| M | NVIDIA GeForce RTX 5090 Mobile nvidia · 2025 | 24 GB | ? | ? | ? | ? | ? | ? | 8.6 |
| M | NVIDIA GeForce RTX 4090 Mobile nvidia · 2023 | 16 GB | ? | ? | ? | ? | ? | ? | 7.3 |
| M | NVIDIA GeForce RTX 3050 Ti (Mobile) nvidia · 2021 | 4 GB | 192 | ? | — | — | — | — | 1.5 |
| S | AMD Instinct MI300A (APU) amd · 2023 | 128 GB | ? | ? | ? | ? | ? | ? | 10.0 |
| S | AMD Instinct MI300X amd · 2023 | 192 GB | ? | $15,000 | ? | ? | ? | ? | 10.0 |
| S | AMD Instinct MI325X amd · 2024 | 256 GB | ? | $20,000 | ? | ? | ? | ? | 10.0 |
| S | AMD Instinct MI355X amd · 2025 | 288 GB | ? | $25,000 | ? | ? | ? | ? | 10.0 |
| S | NVIDIA B200 nvidia · 2024 | 192 GB | ? | $40,000 | ? | ? | ? | ? | 10.0 |
| S | NVIDIA DGX Spark (Project Digits) nvidia · 2025 | 0 GB | ? | $3,000 | ? | ? | ? | ? | 10.0 |
| S | NVIDIA GB200 NVL72 nvidia · 2024 | 13824 GB | ? | ? | ? | ? | ? | ? | 10.0 |
| S | NVIDIA H100 NVL nvidia · 2023 | 188 GB | ? | $60,000 | ? | ? | ? | ? | 10.0 |
| S | NVIDIA H100 PCIe nvidia · 2022 | 80 GB | ? | $25,000 | ? | ? | ? | ? | 10.0 |
| S | NVIDIA H100 SXM nvidia · 2022 | 80 GB | ? | $30,000 | ? | ? | ? | ? | 10.0 |
| S | NVIDIA H200 nvidia · 2024 | 141 GB | ? | $31,000 | ? | ? | ? | ? | 10.0 |
| S | NVIDIA L40 nvidia · 2022 | 48 GB | ? | $8,000 | ? | ? | ? | ? | 10.0 |
| S | NVIDIA L40S nvidia · 2023 | 48 GB | ? | $8,500 | ? | ? | ? | ? | 10.0 |
| S | NVIDIA RTX 6000 Ada Generation nvidia · 2022 | 48 GB | ? | $6,499 | ? | ? | ? | ? | 10.0 |
| S | NVIDIA RTX PRO 6000 Blackwell nvidia · 2025 | 96 GB | ? | $8,999 | ? | ? | ? | ? | 10.0 |
| S | AMD Instinct MI210 amd · 2022 | 64 GB | ? | $8,500 | ? | ? | ? | ? | 9.8 |
| S | AMD Instinct MI250X amd · 2021 | 128 GB | ? | $13,000 | ? | ? | ? | ? | 9.7 |
| S | NVIDIA A100 80GB SXM nvidia · 2020 | 80 GB | ? | $17,000 | ? | ? | ? | ? | 9.7 |
| S | NVIDIA A40 nvidia · 2020 | 48 GB | ? | $5,500 | ? | ? | ? | ? | 9.7 |
| S | NVIDIA RTX A6000 (Ampere) nvidia · 2020 | 48 GB | ? | $3,500 | ? | ? | ? | ? | 9.7 |
| S | NVIDIA RTX 5000 Ada Generation nvidia · 2023 | 32 GB | ? | $4,000 | ? | ? | ? | ? | 9.5 |
| S | NVIDIA A100 40GB nvidia · 2020 | 40 GB | ? | $11,000 | ? | ? | ? | ? | 9.2 |
| S | NVIDIA L4 nvidia · 2023 | 24 GB | ? | $2,500 | ? | ? | ? | ? | 9.0 |
| S | NVIDIA RTX A5000 nvidia · 2021 | 24 GB | ? | $2,500 | ? | ? | ? | ? | 8.7 |
| S | Intel Gaudi 3 intel · 2024 | 128 GB | ? | $18,000 | ? | ? | ? | ? | 8.2 |
| S | Intel Gaudi 2 intel · 2022 | 96 GB | ? | $8,000 | ? | ? | ? | ? | 7.9 |
| E | Apple M3 Ultra apple · 2025 | 0 GB | ? | ? | ? | ? | ? | 12● | 10.0 |
| E | Apple M4 Max apple · 2024 | 0 GB | ? | ? | 79● | ? | ? | ? | 10.0 |
| E | Apple M4 Pro apple · 2024 | 0 GB | ? | ? | ? | ? | ? | ? | 10.0 |
| E | Apple M4 Ultra apple · 2025 | 0 GB | ? | ? | ? | ? | ? | ? | 10.0 |
| E | Apple M1 Ultra apple · 2022 | 0 GB | ? | ? | ? | ? | ? | ? | 9.9 |
| E | Apple M2 Ultra apple · 2023 | 0 GB | ? | ? | ? | ? | ? | ? | 9.9 |
| E | Apple M3 Max apple · 2023 | 0 GB | ? | ? | 55● | ? | ? | ? | 9.9 |
| E | Apple M2 Max apple · 2023 | 0 GB | ? | ? | ? | ? | ? | ? | 9.7 |
| E | Apple M1 Max apple · 2021 | 0 GB | ? | ? | ? | ? | ? | ? | 8.9 |
| E | Qualcomm Snapdragon X Elite qualcomm · 2024 | 0 GB | ? | ? | ? | ? | ? | ? | 7.3 |
| E | Qualcomm Snapdragon X Plus qualcomm · 2024 | 0 GB | ? | ? | ? | ? | ? | ? | 5.8 |
| E | Qualcomm Snapdragon 8 Elite qualcomm · 2024 | 0 GB | ? | ? | ? | ? | ? | ? | 5.3 |
| E | Apple A18 Pro apple · 2024 | 0 GB | ? | ? | ? | ? | ? | ? | 5.0 |
| E | Apple M4 (iPad Pro) apple · 2024 | 0 GB | ? | ? | ? | ? | ? | ? | 5.0 |
| E | Google Tensor G4 google · 2024 | 0 GB | ? | ? | ? | ? | ? | ? | 4.8 |
| E | Apple A17 Pro apple · 2023 | 0 GB | ? | ? | ? | ? | ? | ? | 4.7 |
| E | Qualcomm Snapdragon 8 Gen 3 qualcomm · 2023 | 0 GB | ? | ? | ? | ? | ? | ? | 4.5 |
| E | AMD Ryzen AI 9 HX 370 (Strix Point) amd · 2024 | 0 GB | ? | $1,599 | ? | ? | ? | ? | 3.9 |
| E | Intel Core Ultra 7 258V (Lunar Lake) intel · 2024 | 0 GB | ? | $1,199 | ? | ? | ? | ? | 3.8 |
Estimates use the formula tok/s ≈ memory_bandwidth_GBps ÷ model_weights_GB × efficiency — the dominant constraint for autoregressive decode. The 50-70% efficiency band reflects realistic Ollama / llama.cpp / vLLM runtime overhead. See /methodology for the full derivation.
Got a rig? Run a benchmark and turn an estimate into a measured cell. Every measurement improves the table for the next reader.
Lowest $/tok-s pick for each model size
For each model tier, the catalog card with the lowest cost-per-tok/s among cards that fit. Computed from current street price ÷ estimated tok/s midpoint. A pick changing here is the live signal that prices or new hardware shifted the value frontier.
Caveat: $/tok-s is a derived estimate stacked on the bandwidth formula. For workloads where you have a measured benchmark in the table above, trust the measured number first; for unmeasured combinations, this is the ranked best-guess for buyer decisions.
Choosing a GPU for your workload
The hierarchy answers "which is fastest" — but the right card for you depends on which model size you actually want to run. The four most common operator decisions:
- 7B Q4 (autocomplete, single-model chat) — any card with ≥6 GB VRAM works. The decision shifts to price + power draw + cross-vendor preference. Top D-tier cards (Arc B580, RTX 3060) deliver useful tok/s at <$300.
- 14B Q4 (coding assistant, mid-size chat) — ≥11 GB VRAM minimum. C-tier (RTX 4070 / RX 7800 XT) is the value sweet spot at $400-600.
- 32B Q4 (full coding agent, multi-model) — ≥22 GB VRAM. B-tier 24 GB cards are the canonical buy: RTX 3090 used, RX 7900 XTX new, RTX 4090 if budget allows.
- 70B Q4 (frontier-class local) — ≥48 GB VRAM. Single-card: RTX 6000 Ada / L40S / Mac Studio M3 Ultra. Multi-card: dual 3090 / dual 4090. Workstation tier or above.
Need it more personalized? Use /choose-my-gpu for a 9-input recommender, or /will-it-run to validate a specific model + GPU combination.