Gemma 3 4B
4B Gemma 3 for edge. Multimodal.
The 4B Gemma 3 with multimodal capability. Genuinely the best small-model pick when image input matters and VRAM is constrained — fits in under 4 GB at Q4.
Strengths- Native multimodal at 4B — no other model in this size class does this credibly.
- Conversational quality materially better than Phi 3.5 Mini for general chat.
- 128K context even at this size.
- Gemma license restrictiveness.
- Math and structured tasks weaker than Phi 3.5 Mini.
- Knowledge breadth narrow — small-model limitations are real.
- Q4_K_M (2.7 GB): 130–150 tok/s decode, TTFT under 50 ms
- Q5_K_M (3.2 GB): 115–135 tok/s
- Q8_0 (4.8 GB): 95–115 tok/s
Yes, for edge devices with multimodal input requirements, 4–6 GB GPU owners who want chat + vision. No, for math/structured tasks (pick Phi 3.5 Mini), or where chat-only ≥ 8B is a better fit.
How it compares- vs Phi-3.5 Mini (3.8B) → Gemma 3 4B wins on chat + multimodal; Phi wins on math + structured output.
- vs Llama 3.2 3B → similar text capability; Gemma adds multimodal.
- vs Gemma 3 1B → 4B is meaningfully smarter; 1B is for very tight constraints.
ollama pull gemma3:4b-it-q4_K_M
ollama run gemma3:4b-it-q4_K_M
Settings: Q4_K_M GGUF, 8192 ctx, llama.cpp/CUDA, RTX 4090
›Why this rating
7.5/10 — best 4B-class general model when you want multimodal at edge size. Loses to Phi-3.5 Mini on math + structured tasks but beats it on chat naturalness.
Overview
4B Gemma 3 for edge. Multimodal.
Strengths
- Multimodal at 4B
- Edge-class
Weaknesses
- License restrictions
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 2.5 GB | 4 GB |
| Q8_0 | 4.4 GB | 6 GB |
Get the model
Ollama
One-line install
ollama run gemma3:4bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Gemma 3 4B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Gemma 3 4B?
Can I use Gemma 3 4B commercially?
What's the context length of Gemma 3 4B?
How do I install Gemma 3 4B with Ollama?
Does Gemma 3 4B support images?
Source: huggingface.co/google/gemma-3-4b-it
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.