MoE Routing
MoE routing is the gating mechanism that decides which experts a token activates in a Mixture-of-Experts layer. Top-k routing (each token picks its k highest-scoring experts) is dominant — Mixtral and DeepSeek use top-2 and top-8 respectively.
Routing quality depends on training: untrained or poorly-trained routers cluster tokens onto a few experts (load imbalance), wasting capacity. Auxiliary load-balancing losses during training fix this.
At inference, routing introduces an all-to-all communication step in distributed setups; on a single GPU, it's a sparse gather. Quantizing experts independently is harder than quantizing dense weights — different experts have different activation distributions.
Related terms
Reviewed by Fredoline Eruo. See our editorial policy.