Transformer & LLM components
RMSNorm
RMSNorm is a simpler variant of LayerNorm that normalizes activations by their root-mean-square instead of their variance, skipping the mean subtraction and bias term. Used in Llama, Mistral, Qwen, and most modern open-weight LLMs.
The benefit is small but real: ~7% faster than LayerNorm with no quality loss on language modeling benchmarks. The simplicity also makes RMSNorm easier to fuse into preceding/following kernels for further speedup.
Quantization-time gotcha: RMSNorm scales are FP16/FP32 even when the surrounding linear layers go to INT4. Some early GGUF converters lost the scales; check that your converter handles them.
Related terms
Reviewed by Fredoline Eruo. See our editorial policy.