Gemma 3 4B

Positioning

The 4B Gemma 3 with multimodal capability. Genuinely the best small-model pick when image input matters and VRAM is constrained — fits in under 4 GB at Q4.

Strengths

Native multimodal at 4B — no other model in this size class does this credibly.
Conversational quality materially better than Phi 3.5 Mini for general chat.
128K context even at this size.

Limitations

Gemma license restrictiveness.
Math and structured tasks weaker than Phi 3.5 Mini.
Knowledge breadth narrow — small-model limitations are real.

Real-world performance on RTX 4090

Q4_K_M (2.7 GB): 130–150 tok/s decode, TTFT under 50 ms
Q5_K_M (3.2 GB): 115–135 tok/s
Q8_0 (4.8 GB): 95–115 tok/s

Should you run this locally?

Yes, for edge devices with multimodal input requirements, 4–6 GB GPU owners who want chat + vision. No, for math/structured tasks (pick Phi 3.5 Mini), or where chat-only ≥ 8B is a better fit.

How it compares

vs Phi-3.5 Mini (3.8B) → Gemma 3 4B wins on chat + multimodal; Phi wins on math + structured output.
vs Llama 3.2 3B → similar text capability; Gemma adds multimodal.
vs Gemma 3 1B → 4B is meaningfully smarter; 1B is for very tight constraints.

Run this yourself

ollama pull gemma3:4b-it-q4_K_M
ollama run gemma3:4b-it-q4_K_M

Settings: Q4_K_M GGUF, 8192 ctx, llama.cpp/CUDA, RTX 4090

Quantization	File size	VRAM required
Q4_K_M	2.5 GB	4 GB
Q8_0	4.4 GB	6 GB

Quantization

File size

VRAM required

Q4_K_M

2.5 GB

4 GB

Q8_0

4.4 GB

6 GB

Frequently asked

What's the minimum VRAM to run Gemma 3 4B?

4GB of VRAM is enough to run Gemma 3 4B at the Q4_K_M quantization (file size 2.5 GB). Higher-quality quantizations need more.

Can I use Gemma 3 4B commercially?

Yes — Gemma 3 4B ships under the Gemma Terms of Use, which permits commercial use. Always read the license text before deployment.

What's the context length of Gemma 3 4B?

Gemma 3 4B supports a context window of 131,072 tokens (about 131K).

How do I install Gemma 3 4B with Ollama?

Run `ollama pull gemma3:4b` to download, then `ollama run gemma3:4b` to start a chat session. The default quantization is Q4_K_M.

Does Gemma 3 4B support images?

Yes — Gemma 3 4B is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing