VRAM calculator for local LLMs
Honest VRAM math. Picks up where the model card's "just {n} GB" line ends — separates weights from KV cache from activations from runtime overhead. Long context bites. KV quantization helps. The bar shows exactly how much.
Pure client. No tracking. Math is open and citable.
Inputs
VRAM breakdown
Operator notes
Add 2-3 GB on top of the total for OS, display output, and driver overhead — the calculator returns "model footprint" not "total card consumption".
Long context = mostly KV cache. At 32K ctx, KV often exceeds weights. Quantize KV to fp8 to halve it, or to int4 to quarter it (small quality cost).
Weights alone: 4.8 GB. KV cache alone @ 8,192 ctx: 1.1 GB.
Embed this calculator
Link to this page from articles, READMEs, or community threads — screenshot welcome, attribution appreciated.
Suggested citation: Calculator by RunLocalAI · runlocalai.co · CC-BY-4.0
Related: Custom build engine · GPU memory flow diagram · Quantization formats · Electricity calculator