MPS (Metal Performance Shaders)

MPS is Apple's high-level Metal-based compute library, exposed in PyTorch as the mps device backend. Calling model.to("mps") runs on the Apple Silicon GPU through MPS kernels.

MPS is workable for inference of small models but historically incomplete: many ops fall back to CPU, FP16 is supported but BF16 is not on older silicon, and large allocations sometimes hit RuntimeError: MPS backend out of memory even with available unified memory due to the 80% allocation limit.

For local LLM inference, llama.cpp's native Metal kernels and MLX-LM both outperform PyTorch MPS by 1.5–3×. Use MPS for quick PyTorch experiments; use llama.cpp or MLX for production.

Related terms

See also