Model produces gibberish or repeats one token forever
Cause
When the runtime uses a different tokenizer than the model was trained with, output looks superficially structured but is meaningless. Common causes:
- Mistral Nemo's "Tekken" tokenizer is new — older runners use the wrong tokenizer
- A LoRA adapter was loaded against the wrong base model
- Special tokens (
<|im_start|>,<bos>,<eos>) are not being applied because chat template is missing or wrong - Quantization step accidentally stripped tokenizer files
Solution
1. Update your runner. New tokenizers ship in major llama.cpp / Ollama / vLLM releases.
# Ollama
ollama --version # check
# Update via official installer if behind 0.5.x
# llama.cpp — pull and rebuild
git pull && make clean && make GGML_CUDA=1 -j
2. Verify the chat template. The system+user format must match what the model was trained on. For ChatML models:
<|im_start|>system
You are helpful.<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Wrong template = gibberish even with the right tokenizer.
3. Re-download the model. The GGUF file should include the tokenizer. If you converted it yourself with an old llama.cpp, the tokenizer metadata may be stale:
ollama pull mistral-nemo:12b
4. Don't mix LoRA + base model from different versions. A Llama 3.1 LoRA loaded on a Llama 3.0 base produces gibberish even if the parameter shapes match.
Related errors
Did this fix it?
If your case was different, email hello@runlocalai.co with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.