Training & optimization
Q2_K Quantization
Q2_K is 2-bit GGUF quantization averaging ~3.0 bits per parameter (with mandatory 4-bit scales and importance metadata). It exists for very large models on very small hardware.
Quality is materially worse than Q4 — perplexity often 1.5–3 points above FP16, with coherence breakdowns on long generations. For most local AI, Q2_K is not recommended; pick a smaller model at higher precision.
The legitimate use case: running a 70B+ model on 24 GB of VRAM, where Q2_K is the only fit. Even then, expect noticeable hallucination and weak instruction-following compared to the same model at Q4.
Related terms
See also
Reviewed by Fredoline Eruo. See our editorial policy.
Buyer guides
When it doesn't work