MLX / Metal: command buffer execution failed
Cause
Apple's Metal Performance Shaders compiler hit an internal error or hardware fault during a kernel launch. Most often caused by:
- Running out of unified memory mid-inference (no clear OOM error like CUDA gives)
- macOS version too old for the MLX features the model uses
- A specific quantization format the Metal backend doesn't support yet
Solution
Free unified memory. Macs have no separate VRAM — system memory IS GPU memory. Close apps:
# Show top memory consumers
top -o MEM
Common culprits: Chrome (1-3 GB), Slack (500 MB), Spotify (300 MB), other ML scripts.
Update macOS. MLX requires macOS 13.5+ for full features; some kernels need 14+. Apple → About This Mac shows version. Update via System Settings.
Update MLX:
pip install --upgrade mlx mlx-lm
Try a different quantization. MLX has its own format (-mlx-4bit, -mlx-8bit). GGUF Q4_K_M run via llama.cpp on Metal is also a fallback path:
# llama.cpp on Metal — different code path, sometimes works when MLX doesn't
./main -m model.gguf --n-gpu-layers 999
Reduce context size if the error only happens at long contexts — KV cache is the most common pressure point.
Did this fix it?
If your case was different, email hello@runlocalai.co with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.