Local AI evaluations

Community-submitted lm-evaluation-harness scores on local models running on local hardware. Distinct from /benchmarks (tok/s + VRAM). Reproducibility comes from the pinned harness commit + runtime version + exact command line.

Got an eval to share? Submit it — moderation takes 1-7 days; we never auto-publish.

No public evaluations yet. Be the first to submit one via /submit/evaluation.