GGML

GGML is the C/C++ tensor library that underlies llama.cpp, whisper.cpp, and the original GGUF format. It provides quantized integer kernels, CPU/GPU dispatch (Metal, CUDA, OpenCL, Vulkan, SYCL), and the file format that GGUF replaced in 2023.

The "GGML format" name still appears in older blog posts and model cards. As of 2024 it's deprecated — all current llama.cpp releases require GGUF. If you find a .bin file labeled "ggmlv3" the only path forward is to re-convert from the original safetensors source or download a community-converted GGUF.

The library itself is alive and well; it's the file format that was renamed.

Related terms

See also