About RunLocalAI | RunLocalAI

RunLocalAI exists to answer one question: can my hardware run this model, and how fast? Most AI coverage is about cloud APIs. We focus on the underserved local-inference niche — Ollama, LM Studio, llama.cpp, vLLM, KoboldCPP, oobabooga — and the real hardware people actually own.

Every page on this site is anchored to verifiable data: parameter counts, license terms, quantization sizes, real tokens-per-second measurements. We don't publish vibes — we publish numbers we measured ourselves or sourced from named community contributors.

Our test hardware

The benchmarks on this site are run on:

NVIDIA RTX 4090 (primary test bench)
AMD Radeon RX 7900 XTX (ROCm / Linux testing)
Apple Silicon (M-series, MLX backend)

All recommendations on this site are based on practical usability, not synthetic benchmarks. Models are pulled, loaded, and timed on the rig. When a measurement is extrapolated rather than measured, we mark it as such on the page.

How content is created

We use AI assistance for drafting and structuring content, with human editorial review before publication. See our editorial policy for the full process.

How the site makes money

Affiliate links (Amazon, Jumia for hardware) and display advertising. We disclose these relationships clearly — see How we make money.

Contact

For corrections, tip-offs, partnership inquiries: see our contact page.