RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
Glossary / Transformer & LLM components / Sampling (Decoding)
Transformer & LLM components

Sampling (Decoding)

Sampling is the process of converting model logits into output tokens. Common strategies: greedy (temperature 0), random sampling (temperature > 0), top-k, top-p (nucleus), min-p, typical sampling, mirostat. Most runtimes let you stack them — top-p over top-k over temperature.

The sampling configuration has more impact on perceived quality than most users assume — temperature 0.1 vs 0.7 vs 1.2 produces output that feels like different models. Defaults vary widely: Ollama defaults to temperature 0.8, vLLM to 1.0, llama.cpp to 0.8.

For evaluation, document the full sampling config when reporting numbers. "Llama 3.1 8B got 70 on MMLU" is meaningless without specifying whether that's at temperature 0 or with sampling.

Related terms

Deterministic DecodingMirostat SamplingTemperature (sampling)Top-p (Nucleus) SamplingTop-k Sampling

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →