gemma
7B parameters
Commercial OK

CodeGemma 7B

Coding-specialist Gemma. Decent FIM completion. Now mostly historical with Qwen 2.5 Coder dominating.

License: Gemma Terms of Use·Released Apr 9, 2024·Context: 8,192 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
6.8/10
Positioning

A small, fast coder for the 8 GB VRAM tier. CodeGemma 7B was good when it shipped; today it's the right choice only when license terms or the Gemma toolchain matter, since Qwen 2.5 Coder 7B and DeepSeek Coder V2 Lite both outperform it.

Strengths
  • 5 GB at Q4_K_M — runs on 6 GB cards.
  • Fast fill-in-the-middle — good for editor autocomplete latency.
  • Stable runner support.
Limitations
  • Beaten by Qwen 2.5 Coder 7B on capability.
  • Gemma license restrictiveness.
  • Repo-context handling weaker than newer coders.
Real-world performance on RTX 4090
  • Q4_K_M (5.0 GB): 95–115 tok/s decode, TTFT under 70 ms
  • Q5_K_M (5.9 GB): 84–100 tok/s
  • Q8_0 (8.7 GB): 65–80 tok/s
Should you run this locally?

Yes, for 6–8 GB VRAM coders where Qwen license is a problem. No, for new deployments — pick Qwen 2.5 Coder 7B or DeepSeek Coder V2 Lite.

How it compares
  • vs Qwen 2.5 Coder 7B → Qwen wins on capability; CodeGemma has better Apache-flavored license terms (still Gemma-restricted but cleaner than Qwen).
  • vs DeepSeek Coder V2 Lite (16B) → DeepSeek is much more capable; CodeGemma wins on VRAM (5 GB vs ~10 GB).
  • vs Codestral 22B → Codestral is dramatically more capable; CodeGemma is the constrained-VRAM pick.
Run this yourself
ollama pull codegemma:7b-instruct-q4_K_M
ollama run codegemma:7b-instruct-q4_K_M
Settings: Q4_K_M GGUF, 8192 ctx, llama.cpp/CUDA, RTX 4090
Why this rating

6.8/10 — Google's coding model in a 7B body. Decent for autocomplete, but Qwen 2.5 Coder 7B exists and is stronger. Loses points for being eclipsed by every modern coder model in similar VRAM.

Overview

Coding-specialist Gemma. Decent FIM completion. Now mostly historical with Qwen 2.5 Coder dominating.

Strengths

  • Fast small coder

Weaknesses

  • Outpaced by Qwen 2.5 Coder

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M4.2 GB6 GB

Get the model

Ollama

One-line install

ollama run codegemma:7bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/google/codegemma-7b-it

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of CodeGemma 7B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run CodeGemma 7B?

6GB of VRAM is enough to run CodeGemma 7B at the Q4_K_M quantization (file size 4.2 GB). Higher-quality quantizations need more.

Can I use CodeGemma 7B commercially?

Yes — CodeGemma 7B ships under the Gemma Terms of Use, which permits commercial use. Always read the license text before deployment.

What's the context length of CodeGemma 7B?

CodeGemma 7B supports a context window of 8,192 tokens (about 8K).

How do I install CodeGemma 7B with Ollama?

Run `ollama pull codegemma:7b` to download, then `ollama run codegemma:7b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/google/codegemma-7b-it

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.