Sampling (Decoding) — AI glossary

Sampling is the process of converting model logits into output tokens. Common strategies: greedy (temperature 0), random sampling (temperature > 0), top-k, top-p (nucleus), min-p, typical sampling, mirostat. Most runtimes let you stack them — top-p over top-k over temperature.

The sampling configuration has more impact on perceived quality than most users assume — temperature 0.1 vs 0.7 vs 1.2 produces output that feels like different models. Defaults vary widely: Ollama defaults to temperature 0.8, vLLM to 1.0, llama.cpp to 0.8.

For evaluation, document the full sampling config when reporting numbers. "Llama 3.1 8B got 70 on MMLU" is meaningless without specifying whether that's at temperature 0 or with sampling.

Related terms