Failed to load model: GGUF version mismatch
Cause
GGUF has been versioned several times: v1 (early 2024), v2 (mid-2024), v3 (late 2024). Each bump added fields (chat templates, tokenizer config, KV cache hints). Older llama.cpp builds reject newer GGUFs; very old GGUFs may also be rejected by current builds if backward compatibility was dropped for a deprecated field.
A common form: download a fresh quant from a recent uploader (bartowski, lmstudio-community), point an old llama.cpp at it, and hit this on load.
Solution
1. Update llama.cpp / Ollama / LM Studio to the latest release:
# llama.cpp from source
cd llama.cpp && git pull && make clean && make GGML_CUDA=1 -j
# Homebrew
brew upgrade llama.cpp
# Ollama
curl -fsSL https://ollama.com/install.sh | sh
# LM Studio: in-app "Check for updates"
2. If you can't update the runner, find an older GGUF of the same model. Hugging Face shows multiple uploaders per model — one of them usually has a v2-format file.
3. Convert from safetensors yourself with the matching llama.cpp version:
python convert_hf_to_gguf.py /path/to/hf-model --outfile model.gguf --outtype f16
./llama-quantize model.gguf model.Q4_K_M.gguf Q4_K_M
4. Check the GGUF version from the file header:
xxd model.gguf | head -1
# Bytes 4-7 are the version (little-endian)
Related errors
Did this fix it?
If your case was different, email support@runlocalai.co with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.