GGUF corrupt on disk — validate and recover a broken model file
A corrupt GGUF file fails with cryptic magic-number or read errors. Here's how to validate the file without loading it, identify the corruption, and re-download only the damaged parts.
Diagnostic order — most likely first
Incomplete download (connection dropped mid-transfer)
File size is smaller than expected. Compare `ls -l model.gguf` against the size listed on HuggingFace. A 70B Q4_K_M should be roughly 40.5 GB. If your file is 23.1 GB, the download was interrupted.
Re-download with resume support. `huggingface-cli download <org>/<repo> <file>.gguf --resume-download` picks up where the interrupted download left off. Or use `wget -c <url>` (continue flag). Delete partially corrupt files first: `rm model.gguf` then restart with a resume-capable tool.
GGUF header magic bytes corrupted (disk error / filesystem issue)
`./llama-gguf model.gguf` errors with 'invalid magic number' or 'not a valid GGUF file.' The file's first 4 bytes should be `GGUF` (0x47 0x47 0x55 0x46). Verify: `xxd model.gguf | head -1` should show `00000000: 4747 5546 ...`. If it shows anything else, the header is corrupt.
The file is unrecoverable at the header level — you need to re-download the entire file. Check the disk for underlying issues: Linux `smartctl -a /dev/sdX`, Windows `wmic diskdrive get status`. If disk shows errors, move the model to a healthy drive. Frequent GGUF corruption across different downloads points to a failing disk or RAM.
Tensor data truncated (header valid but tensor count/size mismatch)
Model starts loading in llama.cpp, then errors with 'read error: expected 4096 bytes, got 2048' or 'GGML_ASSERT: tensor offset out of range.' The header metadata says there should be N tensors with total size S, but the file is smaller than header_size + S.
Use `gguf-dump <file>.gguf` to print metadata. Compare the listed tensor count and total_size against the actual file size. If sizes don't match, the file was truncated. Re-download from the source. For a quick validation: `python -c 'from gguf import GGUFReader; r = GGUFReader("model.gguf"); print(len(r.tensors), "tensors")'` — if this throws, the GGUF is structurally broken.
SHA256 mismatch from HuggingFace (file was tampered with or corrupted in transit)
HuggingFace repo lists a SHA256 for each file. Compute yours: `sha256sum model.gguf` (Linux/Mac) or `certutil -hashfile model.gguf SHA256` (Windows). Doesn't match the published hash.
Delete and re-download from the official source (not a mirror). If the SHA256 still doesn't match after re-download, the repo's published SHA256 might be wrong — check the HuggingFace repo's community tab for others reporting the same. For critical workloads, never use a GGUF whose SHA256 you can't verify.
GGUF format version newer than what your runtime supports
`./llama-gguf --version` shows v3. A newly-downloaded model errors with 'unsupported GGUF version: 4.' The model was converted with a newer version of the GGUF spec than your llama.cpp build.
Build llama.cpp from HEAD: `git pull && rm -rf build && cmake -B build -DGGML_CUDA=ON && cmake --build build --config Release -j`. Or pin to a release tag that supports GGUF v4+ (llama.cpp b4470+). For Ollama: `ollama upgrade` or reinstall the latest version. Forward compatibility is rare — model files track the spec version, and older runtimes can't read newer formats.
Frequently asked questions
How can I validate a GGUF file without loading the full model into VRAM?
Three levels: (1) Header check: `./llama-gguf model.gguf` — works in seconds, validates metadata. (2) Structure check: `python -c 'from gguf import GGUFReader; r = GGUFReader("model.gguf"); print(r)'` — validates the tensor index, a few seconds. (3) Full scan: `./llama-quantize --validate model.gguf` (if your llama.cpp build includes this) — reads every byte, 5-15 minutes for a 40 GB file. Use level 1-2 for sanity checks, level 3 after a sketchy download.
Can I repair a corrupt GGUF or is re-download always the answer?
For header or tensor-index corruption: no. The file is a binary blob with an index at the front — lose the index, lose the ability to locate tensors. For a truncated file (e.g., last 5% missing): technically yes, you could truncate the tensor count in the header to match, but the missing tensors might be output weights that are critical. Practical answer: re-download. The bandwidth cost of a re-download is lower than the debugging cost of a repaired GGUF with subtle bugs.
What's the most reliable source for GGUF files?
For popular models, HuggingFace users `bartowski`, `lmstudio-community`, and `mradermacher` publish GGUFs that are consistently well-formed, tested, and SHA256-verified. They all use the reference llama.cpp quantization toolchain. Avoid random GGUF uploads from users with few contributions or no SHA256 published. For Ollama, the official library (ollama.com/library) is the safest source — these are scanned and tested by the Ollama team.
Why does my GGUF work in llama.cpp but crash in Ollama?
Ollama validates GGUF metadata more strictly than llama.cpp. If a GGUF has a missing or malformed chat template, BOS/EOS token misconfiguration, or an unrecognized architecture string, llama.cpp might load it with warnings while Ollama rejects it outright. Use `./llama-gguf` to inspect the metadata — fields like `tokenizer.chat_template` must be present and valid for Ollama to accept the file. Re-download from a source that publishes Ollama-tested GGUFs (look for 'ollama' in the HuggingFace repo description).
Can I verify a GGUF's integrity mid-download using partial checksums?
Not natively. GGUF doesn't support partial checksums. The HuggingFace downloader (`huggingface-cli download`) verifies the SHA256 of the complete file against the repo's published hash after download. For incremental verification during very large downloads, use a tool like `aria2c --check-integrity=true` with a known-good checksum file, or download to a filesystem with built-in checksumming (ZFS, Btrfs).
How do I avoid corruption when downloading models over slow or unstable connections?
Use a resume-capable downloader. In order of reliability: (1) `huggingface-cli download <repo> --resume-download` — handles retry + backoff + SHA256 verification. (2) `wget -c <direct-download-url>` — basic resume. (3) `aria2c -x 4 -c <url>` — multi-connection resume, faster but CDN rate limits are a risk. Always verify SHA256 after download: `sha256sum model.gguf` and compare to the published hash. If the connection is so unstable that even resume-friendly tools fail, download to a cloud instance with a fast link, verify the SHA256 there, then `rsync -avP` the verified file to your machine.
Related troubleshooting
When llama.cpp / Ollama outputs garbled text or repeats tokens infinitely, the tokenizer baked into the GGUF doesn't match the runtime's expectations. Here's how to confirm and fix.
Safetensors header errors mean the file is corrupted, partially downloaded, or isn't actually a safetensors file. Check file size against the repo, re-download if mismatch, fall back to checked download tools.
HuggingFace download errors split into auth (gated model, no token), rate-limit (anonymous traffic capped), or network (corporate proxy, country block). Diagnose by HTTP status code, fix per cause.
When the fix is hardware
A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time: