Transformer & LLM components
Temperature 0 (Greedy Sampling)
Temperature 0 disables sampling entirely — the model picks the highest-logit token at every step. Equivalent to greedy decoding with top_k=1. Gives reproducible output (subject to GPU non-determinism) but tends toward repetitive, lower-diversity completions.
Use temperature 0 for: code generation, structured output (JSON), evaluation, debugging quantization quality. Avoid for: creative writing, chat where variety matters.
A common bug: setting temperature to a tiny non-zero value (0.001) hoping for "almost deterministic." This is worse than temperature 0 — sampling still happens, output varies, but the variance is small enough to look stable until it suddenly doesn't.
Related terms
Reviewed by Fredoline Eruo. See our editorial policy.