runner
Open source
free
4.7/5
Ollama
The default first-pull tool for local AI. One-line model installs (`ollama run llama3.1`), an OpenAI-compatible HTTP API, good defaults out of the box. Built on llama.cpp.
Overview
The default first-pull tool for local AI. One-line model installs (`ollama run llama3.1`), an OpenAI-compatible HTTP API, good defaults out of the box. Built on llama.cpp.
Pros
- Zero-config setup
- OpenAI-compatible API
- Curated model library
- Cross-platform
Cons
- Less control than raw llama.cpp
- Conservative default context length
Compatibility
| Operating systems | macOS Linux Windows |
| GPU backends | NVIDIA CUDA AMD ROCm Apple Metal CPU |
| License | Open source · free |
Get Ollama
Benchmarks using Ollama
| Model | Hardware | Quant | tok/s | VRAM |
|---|---|---|---|---|
| Mistral 7B Instruct v0.3 | NVIDIA GeForce RTX 4090 | Q4_K_M | 112.3 tok/s | 5.1 GB |
| Llama 3.1 8B Instruct | NVIDIA GeForce RTX 4090 | Q4_K_M | 104.7 tok/s | 5.4 GB |
| Mixtral 8x7B Instruct | NVIDIA GeForce RTX 4090 | Q4_K_M | 31.4 tok/s | 23.1 GB |
Frequently asked
Is Ollama free?
Yes — Ollama is free to download and use and open-source under a permissive license.
What operating systems does Ollama support?
Ollama supports macOS, Linux, Windows.
Which GPUs work with Ollama?
Ollama supports NVIDIA CUDA, AMD ROCm, Apple Metal, CPU. CPU-only inference is also possible but slow.
Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.