Visual diagram
CC-BY-4.0

Benchmark lifecycle — full pipeline

Eight stages from a community-filed request to a high- confidence catalog row, plus the three exits that never render publicly. This is the deeper companion to the seven-state schema diagram at /resources/benchmark-request-lifecycle.

Last reviewed 2026-05-08 · By Fredoline Eruo, Independent Local AI Researcher.

Benchmark full pipeline — request to high-confidenceEight-stage horizontal pipeline showing the path a benchmark request travels: submitted, accepted, claimed, submitted as measurement, moderated, approved-public, reproduced, and independently-reproduced. Side branches show rejected, duplicate, and stale exits.request submittedform posted · pending triageacceptededitorial review passedclaimedoperator signaled intentsubmittedmeasurement at /submit/benchmarkmoderatedsubmission reviewedapproved-publicrenders in catalogreproduced≥1 independent operatorindependently-reproduced≥2 distinct operators · high confidencerejectedout-of-scope · unverifiableduplicatematches existing benchmarkstaleages out after 18 monthsHAPPY PATH — request to high-confidenceSIDE BRANCHES (terminal · do not render publicly)RunLocalAI · CC-BY-4.0

What the diagram shows

The seven-state lifecycle diagram is honest about the request enum, but it stops where the request becomes a measurement. In practice the trip from a typed-in form to a high-confidence row is longer. Editorial moderates the request first; an operator claims it; the operator runs the measurement and submits it through /submit/benchmark; editorial moderates the submission; the row goes approved-public; subsequent operators reproduce it; once two distinct independent operators have confirmed the numbers, the row earns the “independently- reproduced” label and the confidence tier promotes.

The three side branches are deliberately drawn faint. They are real, they happen often, and they are terminal. A rejected request never appears on /benchmarks/wanted. A duplicate submission gets pointed back at the existing row. A stale row ages out after eighteen months because stack churn (driver updates, runtime versions, model re-releases) eventually erodes the comparability of an old measurement; the catalog protects itself by retiring rows that have not been reproduced recently.

The discipline this diagram enforces is the asymmetry between submission and reproduction. Anyone can file a request. Anyone can submit a measurement. But the catalog only promotes a row when independent operators on distinct hardware have confirmed it. That is the line between “a number we have” and “a number you can rely on for a buying decision.”

Use this diagram in articles, blog posts, or onboarding decks where you need to explain what “published, reviewed, and independently reproduced” actually means in editorial practice. The state names match the schema enums and the moderation queue UI; the same vocabulary travels through the codebase, the admin dashboard, and the public market.

Embed snippet

<a href="https://runlocalai.co/resources/benchmark-lifecycle" rel="noopener">RunLocalAI: Benchmark Lifecycle (full pipeline)</a>

License: CC-BY-4.0.

Next steps

File the kind of measurement this diagram describes.