Interactive calculator

VRAM calculator for local LLMs

Honest VRAM math. Picks up where the model card's "just {n} GB" line ends — separates weights from KV cache from activations from runtime overhead. Long context bites. KV quantization helps. The bar shows exactly how much.

Pure client. No tracking. Math is open and citable.

Inputs

Model parameter count?8.0 B

Quantization?

Backend?

Context length?8,192 tokens

KV cache dtype?

VRAM breakdown

16.1 GB

total VRAM required

Weights 4.8 GBKV 1.1 GBActivations 8.4 GBOverhead 1.8 GB

Where this fits

8 GB — overflow12 GB — overflow16 GB — overflow24 GB — fits32 GB — fits48 GB — fits80 GB — fits

Model weights4.8 GB

KV cache @ ctx1.1 GB

Activations8.4 GB

Runtime overhead1.8 GB

Bits-per-param4.83 bpw

KV bytes/elt2

Operator notes

Add 2-3 GB on top of the total for OS, display output, and driver overhead — the calculator returns "model footprint" not "total card consumption".

Long context = mostly KV cache. At 32K ctx, KV often exceeds weights. Quantize KV to fp8 to halve it, or to int4 to quarter it (small quality cost).

Weights alone: 4.8 GB. KV cache alone @ 8,192 ctx: 1.1 GB.

Try in /will-it-run/custom →

Embed this calculator

Link to this page from articles, READMEs, or community threads — screenshot welcome, attribution appreciated.

Suggested citation: Calculator by RunLocalAI · runlocalai.co · CC-BY-4.0

Inputs

VRAM breakdown

Operator notes

How this works ▸

Embed this calculator