RMSNorm — AI glossary

RMSNorm is a simpler variant of LayerNorm that normalizes activations by their root-mean-square instead of their variance, skipping the mean subtraction and bias term. Used in Llama, Mistral, Qwen, and most modern open-weight LLMs.

The benefit is small but real: ~7% faster than LayerNorm with no quality loss on language modeling benchmarks. The simplicity also makes RMSNorm easier to fuse into preceding/following kernels for further speedup.

Quantization-time gotcha: RMSNorm scales are FP16/FP32 even when the surrounding linear layers go to INT4. Some early GGUF converters lost the scales; check that your converter handles them.

Related terms