Ollama truncates input — default context length is only 2048

(no error — long inputs get silently truncated)

By Fredoline Eruo · Last verified May 6, 2026

Cause

Ollama's default num_ctx is 2048 tokens, regardless of what the underlying model supports. A model that "supports 128K context" still defaults to 2K when run via ollama run. Your long prompts get silently truncated.

Solution

Set context per-request via API:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "...",
  "options": { "num_ctx": 32768 }
}'

Or create a Modelfile to make it stick for a model:

# Save as Modelfile
FROM llama3.1:8b
PARAMETER num_ctx 32768

ollama create llama3.1:8b-32k -f Modelfile
ollama run llama3.1:8b-32k

Tradeoff: higher context = more VRAM via KV cache. A 7B model with 32K context needs ~12 GB VRAM (vs ~5 GB at 2K). Use Will it run? to find your sweet spot.

Set globally via env (affects all Ollama models in this session):

OLLAMA_NUM_CTX=32768 ollama serve

Related errors

Did this fix it?

If your case was different, email hello@runlocalai.co with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.