Unified Memory

Unified memory is a memory architecture where CPU and GPU share the same physical RAM pool, eliminating CPU↔GPU copies. Apple Silicon and AMD Strix Halo (Ryzen AI Max) use this; modern NVIDIA Grace-Hopper and GB10 systems also expose unified memory across chips.

For local AI, unified memory is the reason a 128 GB M3 Ultra Mac can load and run a 120 GB model without dedicated VRAM — something no consumer NVIDIA card can do.

The tradeoff is bandwidth: unified DDR/LPDDR (200–600 GB/s) sits between consumer GDDR6X and HBM. A model that fits on both an M-series Mac and an RTX 4090 will usually run faster on the 4090 due to bandwidth, even though the Mac has more capacity headroom.

Related terms

See also