Q2_K Quantization

Q2_K is 2-bit GGUF quantization averaging ~3.0 bits per parameter (with mandatory 4-bit scales and importance metadata). It exists for very large models on very small hardware.

Quality is materially worse than Q4 — perplexity often 1.5–3 points above FP16, with coherence breakdowns on long generations. For most local AI, Q2_K is not recommended; pick a smaller model at higher precision.

The legitimate use case: running a 70B+ model on 24 GB of VRAM, where Q2_K is the only fit. Even then, expect noticeable hallucination and weak instruction-following compared to the same model at Q4.

Related terms

See also