Gemma

by Google DeepMind

Google's open-weight derivative of Gemini research. Gemma 2 + Gemma 3 cover sub-30B chat; CodeGemma adds code-specialized variants. Tight integration with Vertex AI / Google Cloud / Android Studio.

Best entry point for local use

Start with Gemma 3 12B at Q4_K_M via Ollama — fits on single RTX 3060 12GB at Q4 (7 GB VRAM). The 12B delivers MMLU ~82% and punches above its weight class on instruction-following (IFEval ~78%). Google's distillation from Gemini training data gives Gemma 3 context-handling quality that smaller models rarely achieve — usable 32K context without perplexity collapse. For minimum VRAM (<8 GB), use Gemma 3 4B Q4_K_M (3 GB) — runs on any laptop with integrated GPU at 15+ tok/s via llama.cpp. Skip Gemma 2 27B for local deployment — its 256K vocab tokenizer wastes ~25% more tokens on English vs Llama tokenizer, inflating effective context cost. Skip Gemma 1 entirely — Gemma 3 12B matches or exceeds Gemma 2 27B on benchmarks at half the VRAM.

Deployment guidance

For single-user local: Ollama + gemma3:12b Q4_K_M on RTX 3060 12GB or Apple M3 via MLX-LM. Gemma's GeGLU activation and 256K vocab require GGUF format built with latest llama.cpp (b3400+) for correct RoPE theta. For multi-user serving: vLLM 0.6.1+ with AWQ 4-bit on L4 24 GB — Gemma's dense architecture parallelizes efficiently. For mobile/edge: MediaPipe LLM Inference on Tensor G4 via Google AI Edge — Gemma 3 4B runs entirely on-device at ~12 tok/s with 4-bit WebGPU acceleration. For NVIDIA GPU maximum throughput: TensorRT-LLM FP8 on L40S. Note: Gemma's bespoke license prohibits use for generating training data for competing models — review terms before production deployment. See GPU buyer guide.

Featured models

Models in this family with our verdicts

CodeGemma 7B Gemma 4 31B Dense

Recommended runtimes

llama.cpp Ollama vLLM

Related families

Phi Llama

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Runtimes that fit

Alternatives

Phi Llama

Before you buy

Verify Gemma runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →