Hardware buyer guide · 2 picksEditorialReviewed May 2026

Best iGPU (integrated graphics) for local AI

Honest 2026 guide to running local AI without a discrete GPU. Apple M-series unified memory wins decisively; Ryzen 8000-series iGPUs are real for 7B-13B; Intel Arc iGPU + IPEX-LLM is the Linux escape hatch.

By Fredoline Eruo · Last reviewed 2026-05-08

The short answer

Apple M-series unified memory is the only iGPU path that's actually competitive at the 70B Q4 tier. M4 Max with 64-128 GB unified runs models that need a $2,000+ Windows AI PC.

On x86: Ryzen 8000-series APUs (Ryzen 7 8700G, AI Max+) deliver usable 7B-13B Q4 inference via the integrated Radeon graphics — workable for casual local AI without a discrete GPU.

Intel Arc iGPU (Lunar Lake, Meteor Lake) + IPEX-LLM on Linux is the third option. Sub-tier performance vs Apple but competitive at the entry budget.

The picks, ranked by buyer-leverage

#1

Apple M4 Max (64-128 GB unified) — best iGPU for local AI

full verdict →

64 GB · $3,500-5,500 (M4 Max + 64-128 GB unified)

The only 'iGPU' that runs 70B Q4 comfortably. Unified memory architecture is uniquely Apple — no x86 platform delivers this VRAM-equivalent at any price.

Buy if
  • Buyers wanting laptop-class AI without a discrete GPU
  • 70B-class inference + multi-model serving silently
  • Mac-first households allergic to PC builds
Skip if
  • CUDA-locked workflows (vLLM, TensorRT)
  • Image gen + LoRA training (CUDA wins on these)
  • Tight budgets ($1,500-2,000 PC build is more flexible at lower tiers)
▼ CHECK CURRENT PRICE
Affiliate disclosure: we earn a small commission on purchases made through these links. The opinion comes first.
#2

Apple M4 Pro Mac mini (48 GB unified) — value iGPU pick

full verdict →

48 GB · $1,800-2,400

Punches well above its weight. 48 GB unified memory at $1,800 runs 70B Q4 with care. The leverage Mac iGPU pick.

Buy if
  • Mac-first households wanting serious local AI on a budget
  • Always-on inference servers (silent, low power)
  • Multi-model 13-32B workflows
Skip if
  • Llama 4 Maverick / DeepSeek V3 671B operators
  • CUDA ecosystem requirements
  • Heavy LoRA training
HonestyWhy benchmark numbers on this page might not reflect your real experience
  • tok/s is not user experience. Humans read at ~10-15 tok/s — anything above that is buffer time, not perceived speed.
  • Context length changes everything. A 70B Q4 model at 1024 tokens generates ~25 tok/s; the same model at 32K context drops to ~8-12 tok/s as KV cache fills.
  • Quantization changes the conclusion. Q4_K_M vs Q5_K_M vs Q8 produce different speed AND different quality. A benchmark at one quant doesn't translate to another.
  • Thermal throttling changes long sessions. The first 15 minutes of a benchmark see boost-clock peak; the next 4 hours see steady-state, which is 5-15% slower depending on case airflow.
  • Driver and runtime versions silently shift winners. A 2024 benchmark on PyTorch 2.4 + CUDA 12.4 doesn't reflect 2026 reality on PyTorch 2.6 + CUDA 12.6. Discount benchmarks older than 6 months.
  • Vendor and YouTuber benchmarks are cherry-picked. The standard 'Llama 3.1 70B Q4 at 1024 tokens' chart shows peak decode on a tiny prompt — exactly the conditions least representative of daily use.
  • Our ranking is by workload fit at the buyer's actual budget — not by raw benchmark order. A faster card that doesn't fit your workload ranks below a slower card that does.

We try to surface these caveats where they apply. If a number on this page reads more confident than it should, please email us via contact. See also our methodology and editorial philosophy.

How to think about VRAM tiers

iGPU 'VRAM' is unified memory shared with the OS + apps. macOS reserves ~25-30% (so 64 GB → ~45 GB practical AI budget). Windows + Linux iGPUs are even tighter — most x86 iGPUs cap at 12-16 GB usable from system RAM.

  • Apple unified 24-48 GB (M4 / M4 Pro)13-32B Q4 comfortable. The dominant Mac entry tier.
  • Apple unified 64-128 GB (M4 Max)70B Q4 + multi-model workflows. The flagship iGPU tier.
  • Apple unified 192-512 GB (M3 Ultra Mac Studio)100B+ MoE territory. Workstation-class without dGPU.
  • Ryzen 8000 APU (8700G, AI Max+)7B-13B Q4 viable via Radeon iGPU. Sub-Apple performance but works on Linux.
  • Intel Arc iGPU (Lunar Lake / Meteor Lake)7B Q4 via IPEX-LLM on Linux. Below Apple + AMD; useful for ultra-budget builds.

Compare these picks head-to-head

Frequently asked questions

Can I run 70B models on integrated graphics?

Only on Apple M-series with 64+ GB unified memory. No x86 iGPU has the unified-memory architecture or bandwidth to run 70B Q4 at usable speed in 2026. Ryzen 8000-series and Intel Arc iGPUs cap at 7B-13B Q4 territory.

Why does Apple Silicon work as an iGPU when x86 doesn't?

Apple's unified memory architecture (UMA) gives the GPU access to system RAM at high bandwidth (~273 GB/s on M4 Pro, 546 GB/s on M4 Max). x86 iGPUs are bottlenecked by DDR5 system bandwidth (~70-90 GB/s) which is 5-7x slower. The bandwidth gap is the difference between 'workable' and 'usable.'

Is Ryzen 8700G enough for local AI?

For 7B Q4 inference: yes, with patience (~10-20 tok/s). For 13B Q4: tight, often paged from system RAM. For 32B+: not realistic. Pair it with 32-64 GB DDR5-6000 + Vulkan / IPEX-LLM on Linux for the best results.

Should I get a Ryzen AI Max+ instead of a discrete GPU?

Ryzen AI Max+ (Strix Halo) with up to 128 GB unified memory is genuinely interesting for 2026 — the first x86 platform with Apple-style unified-memory architecture. For laptop AI, it's a real alternative to MacBook Pro M4 Max. Verify availability + pricing in your market.

Go deeper

When it doesn't work

Hardware bought, set up correctly, still failing? The highest-volume local-AI errors and their fixes:

If this isn't the right fit

Common alternatives readers consider: