server
Open source
free (OSS, GPL-3.0)

Exo

Personal AI cluster software. Auto-discovers Apple Silicon devices on a LAN and shards a model across them via pipeline + tensor parallelism on top of MLX. The 2026 unlock: Thunderbolt 5 + macOS 26.2 RDMA dropped inter-device latency by ~99%, making consumer-Mac clusters credible — DeepSeek V3 671B runs at 5.37 tok/s on 8x M4 Pro Mac Minis. The default answer for 'I have several Macs and want to run a frontier model.'

By Fredoline Eruo·Last verified May 6, 2026·28,000 GitHub stars

Overview

Personal AI cluster software. Auto-discovers Apple Silicon devices on a LAN and shards a model across them via pipeline + tensor parallelism on top of MLX. The 2026 unlock: Thunderbolt 5 + macOS 26.2 RDMA dropped inter-device latency by ~99%, making consumer-Mac clusters credible — DeepSeek V3 671B runs at 5.37 tok/s on 8x M4 Pro Mac Minis. The default answer for 'I have several Macs and want to run a frontier model.'

Stack & relationships

How Exo relates to other entries in the catalog — recommended pairings, alternatives, dependencies, and edges to avoid. Each edge carries a one-line operator note from our editorial team.

Exo ↔ ecosystem

Recommended stack

  • Pairs with
    MLX-LM

    Exo is how you scale MLX-LM beyond a single Mac. The 2026 unlock — Thunderbolt 5 + macOS 26.2 RDMA — makes the cluster credible for serious models.

Alternatives

  • Alternative to
    Petals

    Petals shards over WAN volunteers; Exo shards over a controlled LAN cluster. Same architectural shape (pipeline parallel across machines), opposite trust models — public swarm vs personal devices.

  • Alternative to
    vLLM

    Different hardware target. vLLM = NVIDIA/Linux datacenter; Exo = Apple Silicon LAN cluster. Pick by which hardware you already own.

  • Competes with
    Petals

    Both are multi-machine inference; Exo runs over a controlled LAN with strong privacy, Petals runs over WAN volunteers with no privacy. Pick by trust model and what hardware you have.

  • Alternative to
    Hyperspace (P2P inference network)

    Different consumer-multi-machine paths. Exo is Apple Silicon LAN clustering; Hyperspace targets WAN P2P. Pick by hardware and trust model.

Depends on

  • Depends on
    MLX-LM

    Exo runs MLX under the hood for the per-device inference layer. Pipeline-parallel scheduling is Exo; the actual matmul kernels are MLX.

Featured in these stacks

The L3 execution stacks that pick this tool as a recommended component, with the one-line note explaining the role it plays in each.

  • Stack · L3·Workstation tier·Role: Distributed serving (multi-Mac cluster)
    Build a Mac-native AI stack (May 2026)

    Exo is what makes multi-Mac credible in 2026: auto-discovers nearby Apple Silicon devices on the LAN, shards models across them via pipeline parallel on top of MLX. Thunderbolt 5 + macOS 26.2 RDMA cuts inter-device latency by ~99%, turning consumer-Mac clusters into a real serving option.

  • Stack · L3·Production tier·Role: Cluster orchestrator
    Build a multi-machine Apple Silicon cluster (May 2026)

    Exo is what makes consumer-Mac clustering viable in 2026. Auto-discovery of nearby nodes; pipeline-parallel sharding via MLX. Thunderbolt 5 RDMA + macOS 26.2 cut inter-device latency by ~99% — the breakthrough that turned this from research demo to credible serving option.

Pros

  • Auto-discovers nearby devices, no cluster manager required
  • RDMA over Thunderbolt 5 makes inter-Mac latency nearly local
  • Runs 670B-class models on consumer hardware that can't fit them otherwise

Cons

  • Apple-Silicon-first; Linux/CUDA path is secondary
  • Thunderbolt-5 RDMA requires specific Macs (M4 Pro+, macOS 26.2+)
  • Not a production-serving solution — designed for personal clusters

Compatibility

Operating systems
macOS
Linux
GPU backends
Apple Metal
NVIDIA CUDA
LicenseOpen source · free (OSS, GPL-3.0)

Get Exo

Frequently asked

Is Exo free?

Exo has a paid tier (free (OSS, GPL-3.0)). Check the pricing page for current terms.

What operating systems does Exo support?

Exo supports macOS, Linux.

Which GPUs work with Exo?

Exo supports Apple Metal, NVIDIA CUDA. CPU-only inference is also possible but slow.

Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.