OSError: Can't load tokenizer for ... / no file named tokenizer.json

Q: What causes "OSError: Can't load tokenizer for ... / no file named tokenizer.json"?

`AutoTokenizer.from_pretrained` couldn't find the tokenizer files in the local cache or on Hugging Face. Common causes: the directory you passed contains weights but no tokenizer files, the download was interrupted, or you're pointing at a fine-tune that didn't ship its own tokenizer (and you need the base model's).

Q: How do you fix "OSError: Can't load tokenizer for ... / no file named tokenizer.json"?

**1. List what's in your model directory:** ```bash ls /path/to/model # Need at minimum: tokenizer.json (or tokenizer.model + tokenizer_config.json) ``` Files typically required: - `tokenizer.json` (fast tokenizer) OR `tokenizer.model` (SentencePiece) - `tokenizer_config.json` - `special_tokens_map.json` - For chat models: a `chat_template` field in tokenizer_config.json **2. Re-download with explicit allow-list** to make sure tokenizer files come through: ```bash hf download / \ --include "tokenizer*" "*.json" "*.model" \ --local-dir /path/to/model ``` **3. If the fine-tune doesn't include its own tokenizer**, point at the base: ```python tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") # base model = AutoModelForCausalLM.from_pretrained("/path/to/finetune") # weights ``` **4. For GGUF inference**, you don't need separate tokenizer files — the tokenizer is embedded in the GGUF. If you're seeing this error from a GGUF flow, you're using the wrong loader. Use `llama-cpp-python`, not `AutoTokenizer`.

Cause

AutoTokenizer.from_pretrained couldn't find the tokenizer files in the local cache or on Hugging Face. Common causes: the directory you passed contains weights but no tokenizer files, the download was interrupted, or you're pointing at a fine-tune that didn't ship its own tokenizer (and you need the base model's).

Solution

1. List what's in your model directory:

ls /path/to/model
# Need at minimum: tokenizer.json (or tokenizer.model + tokenizer_config.json)

Files typically required:

tokenizer.json (fast tokenizer) OR tokenizer.model (SentencePiece)
tokenizer_config.json
special_tokens_map.json
For chat models: a chat_template field in tokenizer_config.json

2. Re-download with explicit allow-list to make sure tokenizer files come through:

hf download <org>/<model> \
  --include "tokenizer*" "*.json" "*.model" \
  --local-dir /path/to/model

3. If the fine-tune doesn't include its own tokenizer, point at the base:

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")  # base
model = AutoModelForCausalLM.from_pretrained("/path/to/finetune")  # weights

4. For GGUF inference, you don't need separate tokenizer files — the tokenizer is embedded in the GGUF. If you're seeing this error from a GGUF flow, you're using the wrong loader. Use llama-cpp-python, not AutoTokenizer.

Cause

Solution

Related errors

Did this fix it?