CodeGemma 7B

Positioning

A small, fast coder for the 8 GB VRAM tier. CodeGemma 7B was good when it shipped; today it's the right choice only when license terms or the Gemma toolchain matter, since Qwen 2.5 Coder 7B and DeepSeek Coder V2 Lite both outperform it.

Strengths

5 GB at Q4_K_M — runs on 6 GB cards.
Fast fill-in-the-middle — good for editor autocomplete latency.
Stable runner support.

Limitations

Beaten by Qwen 2.5 Coder 7B on capability.
Gemma license restrictiveness.
Repo-context handling weaker than newer coders.

Real-world performance on RTX 4090

Q4_K_M (5.0 GB): 95–115 tok/s decode, TTFT under 70 ms
Q5_K_M (5.9 GB): 84–100 tok/s
Q8_0 (8.7 GB): 65–80 tok/s

Should you run this locally?

Yes, for 6–8 GB VRAM coders where Qwen license is a problem. No, for new deployments — pick Qwen 2.5 Coder 7B or DeepSeek Coder V2 Lite.

How it compares

vs Qwen 2.5 Coder 7B → Qwen wins on capability; CodeGemma has better Apache-flavored license terms (still Gemma-restricted but cleaner than Qwen).
vs DeepSeek Coder V2 Lite (16B) → DeepSeek is much more capable; CodeGemma wins on VRAM (5 GB vs ~10 GB).
vs Codestral 22B → Codestral is dramatically more capable; CodeGemma is the constrained-VRAM pick.

Run this yourself

ollama pull codegemma:7b-instruct-q4_K_M
ollama run codegemma:7b-instruct-q4_K_M

Settings: Q4_K_M GGUF, 8192 ctx, llama.cpp/CUDA, RTX 4090

Quantization	File size	VRAM required
Q4_K_M	4.2 GB	6 GB

Quantization

File size

VRAM required

Q4_K_M

4.2 GB

6 GB

Frequently asked

What's the minimum VRAM to run CodeGemma 7B?

6GB of VRAM is enough to run CodeGemma 7B at the Q4_K_M quantization (file size 4.2 GB). Higher-quality quantizations need more.

Can I use CodeGemma 7B commercially?

Yes — CodeGemma 7B ships under the Gemma Terms of Use, which permits commercial use. Always read the license text before deployment.

What's the context length of CodeGemma 7B?

CodeGemma 7B supports a context window of 8,192 tokens (about 8K).

How do I install CodeGemma 7B with Ollama?

Run `ollama pull codegemma:7b` to download, then `ollama run codegemma:7b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run CodeGemma 7B?

Can I use CodeGemma 7B commercially?

What's the context length of CodeGemma 7B?

How do I install CodeGemma 7B with Ollama?