Interactive calculator

VRAM calculator for local LLMs

Honest VRAM math. Picks up where the model card's "just {n} GB" line ends — separates weights from KV cache from activations from runtime overhead. Long context bites. KV quantization helps. The bar shows exactly how much.

Pure client. No tracking. Math is open and citable.

Inputs

8.0 B
8,192 tokens

VRAM breakdown

16.1 GB
total VRAM required
8 GB12 GB16 GB24 GB
Weights 4.8 GBKV 1.1 GBActivations 8.4 GBOverhead 1.8 GB
Where this fits
8 GB — overflow12 GB — overflow16 GB — overflow24 GB — fits32 GB — fits48 GB — fits80 GB — fits
Model weights4.8 GB
KV cache @ ctx1.1 GB
Activations8.4 GB
Runtime overhead1.8 GB
Bits-per-param4.83 bpw
KV bytes/elt2

Operator notes

Add 2-3 GB on top of the total for OS, display output, and driver overhead — the calculator returns "model footprint" not "total card consumption".

Long context = mostly KV cache. At 32K ctx, KV often exceeds weights. Quantize KV to fp8 to halve it, or to int4 to quarter it (small quality cost).

Weights alone: 4.8 GB. KV cache alone @ 8,192 ctx: 1.1 GB.

Embed this calculator

Link to this page from articles, READMEs, or community threads — screenshot welcome, attribution appreciated.

Suggested citation: Calculator by RunLocalAI · runlocalai.co · CC-BY-4.0

Related: Custom build engine · GPU memory flow diagram · Quantization formats · Electricity calculator