qwen

7B parameters

Commercial OK

Reviewed May 2026

Qwen 2.5 Coder 7B Instruct

Coding-specialized Qwen 2.5 at 7B. The 8-12GB-VRAM coding model — entry-tier autocomplete + IDE assistant. Smaller sibling of the 14B / 32B Coder line.

License: Apache 2.0·Released Nov 12, 2024·Context: 131,072 tokens

Overview

Coding-specialized Qwen 2.5 at 7B. The 8-12GB-VRAM coding model — entry-tier autocomplete + IDE assistant. Smaller sibling of the 14B / 32B Coder line.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Parent / base model

Qwen 2.5 Coder 14B Instruct14B

Consumer

Family siblings (qwen-2.5-coder)

Qwen 2.5 Coder 1.5B1.5B

Edge

Qwen 2.5 Coder 3B3B

Edge

Qwen 2.5 Coder 7B Instruct7B

You are here

Qwen 2.5 Coder 14B Instruct14B

Consumer

Qwen 2.5 Coder 32B Instruct32B

Workstation

Distilled / fine-tuned from this

Qwen 2.5 Coder 1.5B1.5B

Edge

Qwen 2.5 Coder 3B3B

Edge

Strengths

Apache 2.0
Fits comfortably in 8GB VRAM at Q4_K_M
60-80 tok/s autocomplete on consumer 12-16GB GPUs

Weaknesses

Trails 14B / 32B on multi-file refactoring
Smaller context coverage than larger siblings

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	4.7 GB	6 GB
Q6_K	6.3 GB	8 GB

Get the model

Ollama

One-line install

ollama run qwen2.5-coder:7bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct

Source repository — direct quantization required.

Benchmarks

Real measurements on real hardware. Numbers ship with the runner version, quant, and date.

1 run on record

Hardware	Provenance	Quant	Ctx	Tokens / sec	VRAM	TTFT	Date
NVIDIA GeForce RTX 3080 16GB (Mobile)(Ollama)	EditorialM	Q4_K_M	8K	79.4tok/s	—	—	May 10, 26

What to do next

Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.

Submit a benchmark for Qwen 2.5 Coder 7B Instruct

OrBrowse the benchmark roadmap Compare hardware options

Hardware that runs this

Cards with enough VRAM for at least one quantization of Qwen 2.5 Coder 7B Instruct.

Frequently asked

What's the minimum VRAM to run Qwen 2.5 Coder 7B Instruct?

6GB of VRAM is enough to run Qwen 2.5 Coder 7B Instruct at the Q4_K_M quantization (file size 4.7 GB). Higher-quality quantizations need more.

Can I use Qwen 2.5 Coder 7B Instruct commercially?

Yes — Qwen 2.5 Coder 7B Instruct ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Qwen 2.5 Coder 7B Instruct?

Qwen 2.5 Coder 7B Instruct supports a context window of 131,072 tokens (about 131K).

How do I install Qwen 2.5 Coder 7B Instruct with Ollama?

Run `ollama pull qwen2.5-coder:7b` to download, then `ollama run qwen2.5-coder:7b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Alternatives

Qwen 2.5 Coder 1.5B Qwen 2.5 Coder 3B Qwen 2.5 Coder 14B Instruct Qwen 2.5 Coder 32B Instruct

Before you buy

Verify Qwen 2.5 Coder 7B Instruct runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →

Qwen 2.5 Coder 7B Instruct

Overview

Family & lineage

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Benchmarks

What to do next

Hardware that runs this

Frequently asked

What's the minimum VRAM to run Qwen 2.5 Coder 7B Instruct?

Can I use Qwen 2.5 Coder 7B Instruct commercially?

What's the context length of Qwen 2.5 Coder 7B Instruct?

How do I install Qwen 2.5 Coder 7B Instruct with Ollama?

Related — keep moving

Models worth comparing

Community benchmarks for this model