Configuration
Verified by owner
Ollama truncates input — default context length is only 2048
(no error — long inputs get silently truncated)
By Fredoline Eruo · Last verified May 6, 2026
Cause
Ollama's default num_ctx is 2048 tokens, regardless of what the underlying model supports. A model that "supports 128K context" still defaults to 2K when run via ollama run. Your long prompts get silently truncated.
Solution
Set context per-request via API:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "...",
"options": { "num_ctx": 32768 }
}'
Or create a Modelfile to make it stick for a model:
# Save as Modelfile
FROM llama3.1:8b
PARAMETER num_ctx 32768
ollama create llama3.1:8b-32k -f Modelfile
ollama run llama3.1:8b-32k
Tradeoff: higher context = more VRAM via KV cache. A 7B model with 32K context needs ~12 GB VRAM (vs ~5 GB at 2K). Use Will it run? to find your sweet spot.
Set globally via env (affects all Ollama models in this session):
OLLAMA_NUM_CTX=32768 ollama serve
Related errors
Did this fix it?
If your case was different, email hello@runlocalai.co with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.