Evaluation metrics
pass@1
pass@1 is the probability that a model's first generated solution passes the unit tests for a coding problem, computed from a single sample at temperature 0 (or pass@k with k=1 estimated from larger samples). Used in HumanEval, MBPP, LiveCodeBench, and most coding benchmarks.
Distinct from pass@10 or pass@100, which let the model take multiple tries — those measure capability under best-of-N scaling, while pass@1 measures the model's "first guess" reliability.
For local AI, pass@1 is what users actually experience: when you ask a model for code, you usually run the first thing it gives you. Benchmark results that quote pass@10 or pass@100 don't translate to single-shot use.
Related terms
Reviewed by Fredoline Eruo. See our editorial policy.