Temperature 0 (Greedy Sampling) — AI glossary

Temperature 0 disables sampling entirely — the model picks the highest-logit token at every step. Equivalent to greedy decoding with top_k=1. Gives reproducible output (subject to GPU non-determinism) but tends toward repetitive, lower-diversity completions.

Use temperature 0 for: code generation, structured output (JSON), evaluation, debugging quantization quality. Avoid for: creative writing, chat where variety matters.

A common bug: setting temperature to a tiny non-zero value (0.001) hoping for "almost deterministic." This is worse than temperature 0 — sampling still happens, output varies, but the variance is small enough to look stable until it suddenly doesn't.

Related terms