Q3_K_M Quantization

Q3_K_M is a 3-bit GGUF K-quant averaging ~3.9 bits per parameter. It's the smallest format that still produces usable output for most models.

Quality drops noticeably: perplexity is typically 0.5–1.0 points above FP16, and complex tasks (multi-step reasoning, code) show measurable degradation. For 7B–13B models, Q3_K_M is rarely worth it — drop to a smaller model at Q4_K_M instead. For 70B+ models on consumer hardware, Q3_K_M is the only path that fits in 36 GB or under.

If output starts producing word salad, the model is past its quant cliff; try Q4_K_S or Q4_K_M instead.

Related terms

See also