Trust evolution · Editorial

Editorial review at RunLocalAI

The trust apparatus on the trust index rests on humans reviewing submissions before they reach the page. This page documents who those humans are, how they handle conflicts of interest, what the scoring methodology asks them to look for, and the internal audit log that keeps the process accountable.

By Fredoline Eruo · Last reviewed 2026-05-07

The review process

Every submission moves through a small set of states. The state machine is documented at /resources/verification-policy for the operator-facing version; here is the trust-relevant summary.

A submission enters as queued. It is invisible to the public. An editor opens the submission, checks plausibility against similar configurations, checks metadata completeness, and either approves it (now visible, carrying Community submitted), rejects it (gone, and not recoverable), or sends it back to the submitter with a note asking for missing information.

Approved submissions remain at community-submitted until a successful reproduction promotes them up the ladder. Editorial does not preemptively grant reproduced or independently-reproduced status to a submission; those tiers are earned by independent operators running the same configuration, not by editorial fiat. The first promise on the trust index — never auto-publish — is enforced here. The second — never percentages — is enforced by the rendering layer. The third — never invent on sparse data — is enforced by editorial refusing to publish derived numbers as if they were measured.

Who actually reviews

Editorial review is performed by a named human. The byline on the page is the same name — typically the site editor whose author profile is linked from every byline footer. We do not use ghost editors, anonymous review pools, or rotating contractor groups. When you read a benchmark page, the person who reviewed the underlying submission is the same person whose name is on the byline, or a named co-reviewer disclosed in the byline.

The site is small. We are honest about that. Most editorial review today is one named editor doing the work. The advantage of that is consistency — every submission goes through the same eyes and the same standards. The disadvantage is the obvious single point of failure, which is why the audit log exists and why we publish the rejection criteria openly: a different editor reading the same rules should reach the same decision on the same submission. The system is constructed to be reviewer-independent even when the reviewer pool is small.

AI assistance is used in the review pipeline for plausibility checks — flagging outliers, surfacing similar prior submissions, catching obvious metadata gaps. AI assistance is never the final say; a human decides whether a submission is approved, rejected, or returned. The editorial policy at /editorial-policy documents the AI-assistance discipline in operational detail.

Conflict-of-interest discipline

Two specific conflicts are worth disclosing, because the structure of the site creates them.

Affiliate links. Some hardware pages on the site carry affiliate links. The disclosure appears in the footer of every page (FTC requirement). The discipline: affiliate relationships do not influence verdicts. There are cards we rate highly that we do not have affiliate relationships with, and there are cards in the affiliate program that we explicitly recommend against. The full disclosure lives at /how-we-make-money.

Owner-supplied benchmarks. When a submission comes from an operator who is also an active contributor to the site — a frequent reproducer, a comment poster, somebody whose prior contributions we have already reviewed — there is a soft bias risk. The mitigation: the same review criteria apply regardless of submitter reputation. A trusted contributor who submits an implausible row gets the same flag-and-review treatment as a first-time submitter. The audit log records the reviewer ID and decision rationale for every submission, which makes the bias visible if it ever appears.

We do not accept payment in exchange for coverage. We do not accept review-unit hardware in exchange for guaranteed positive coverage. We have, on rare occasions, accepted review-unit hardware on the explicit condition that we are free to publish whatever we measure including unfavorable results; when that happens it is disclosed in the byline of the affected page.

Scoring methodology

Beyond benchmark rows, the catalog renders scores on dimensions the engine in src/lib/scoring/ computes — runtime maturity, setup complexity, ecosystem health, compatibility breadth, and a small number of others. These scores are tiers, not percentages, for the same reason benchmark confidence is a tier: false precision is operator-hostile, and the underlying signals do not support finer discrimination than tier labels.

The full per-dimension methodology is at /resources/scoring-methodology. What matters here is the editorial part: scores are not produced by a black-box algorithm. Each dimension has a defined rubric, the rubric is published, and editorial spot-checks sample the score outputs against the rubric on a rolling basis. When a score looks wrong to editorial, the rubric is examined first; the assumption is that a wrong score reflects a rubric gap, not a single-row error.

Scores never override benchmark numbers. A model that scoresexcellent on runtime maturity and poor on compatibility is two facts the operator can read directly; the scoring engine does not collapse them into a single recommendation. The operator's job is to weigh the dimensions according to their use case, and the catalog renders the dimensions clearly so they can.

The content audit log

Every editorial decision on the site is recorded in an internal content audit log. The log captures: the submission ID, the reviewer, the decision (approved, rejected, returned, or moved between trust states), the timestamp, and a short rationale when the decision is non-routine.

The log lives in the editorial moderation surface — the admin-only URL is not published, deliberately, so that the moderation pipeline cannot be probed by adversarial submitters looking for review patterns to game. Editorial accountability does not require exposing the queue; it requires the log existing, being immutable, and being available to any new editor joining the team. Those properties are the contract; the URL is an implementation detail.

What the log enables, in practice: when a reader writes in to ask “why was my submission rejected,” the response is never “we don't know.” The decision was recorded, the rationale was recorded, and we can explain it. When a benchmark moves from community-submitted to reproduced, the log shows which independent operator confirmed it and on what date. When an editorial measurement is updated with a new runtime version, the log shows the old measurement and the new, and the page renders both with run dates so the reader can see the history rather than only the current state.

Corrections

We make mistakes. The discipline around corrections is, like everything else here, written down so that an operator can hold us to it.

Verified errors — wrong VRAM number, outdated GitHub statistic, misattributed benchmark, miscategorized model — are corrected within 7 days of confirmation. The page picks up the correction and a brief correction note appears at the bottom of the page explaining what changed and when. Corrections are entered into the audit log alongside editorial decisions; an editor reviewing the page later can see the correction history.

The fastest path to reporting an error is the contact form at /contact. We read every report. We do not accept “please remove this benchmark because we don't like the number” as a correction; we do accept “the benchmark is on the wrong card model” or “the runtime version is stated incorrectly,” and we will fix the page.

Where to go next

What verified-owner means in practice, the evidence we accept, and the identity information we explicitly never require.

How operators are verified

OrEditorial policy — the operational detail How we make money (and why it doesn't change verdicts)