Qwen 2.5 Coder 7B Instruct on NVIDIA GeForce RTX 3080 16GB (Mobile)
Measured this month.
Measurement
- tok/s
- 79.4
- TTFT
- —
- VRAM used
- —
- RAM used
- —
- Power
- —
- Quant
- Q4_K_M
- Context
- 8K
- Run date
- 2026-05-10
- Source
- owner
First real-rig benchmark — Lenovo Legion 7 with mobile RTX 3080 16GB. Three runs at the same standardized prompt (~70 input tokens, ~250-360 output tokens of TypeScript code generation). Run 1 was cold (10.45s model load); runs 2-3 hot.
— UPDATE 2026-05-10 (V36.29): Added cold-start data point. The 79.38 tok/s figure above is the warm-run median (3 back-to-back runs, GPU at full boost clock 1770 MHz). Two separate cold-start single-runs on the same rig (AC plugged in, Windows Balanced profile, GPU starting at idle clock 330-435 MHz) measured 71.72 and 69.06 tok/s, median 70.39 tok/s. The cold-vs-warm delta is the GPU clock ramp from idle to boost, NOT thermal throttling (GPU temp 66-72°C across runs).
Why this confidence tier?
Confidence is rule-based. Every factor below contributed to the tier. We never expose a single numeric score; the tier label is auditable through this explanation alone.
- +Measured by RunLocalAI editorial
- Reproduce this benchmark →An independent reproduction with matching numbers lifts the tier and reduces single-source risk.
- Read the confidence methodology →Full editorial standards for tiering.
- Why we don't use percentages →Tier labels — auditable, no opaque score.
Cohort intelligence
How this measurement compares to the rest of the corpus. Only comparable rows (same model + hardware first, with relaxations labelled) are used. We never average across runtimes or quant formats unless explicitly told to.
Reproduce this benchmark
Got the same model + hardware combo? Run the same measurement and submit your numbers. We'll pre-fill model, hardware, quant, and context — you just add your tok/s, VRAM, runtime version. If your numbers match within ±15%, this benchmark gets a confidence lift and a reproduction badge.
Related
Drill into the entity pages for this measurement.
Cite or export
Reference this benchmark in your work. Multiple formats; CC-BY attribution required.
Cite this benchmark or paste it into a README. Copy-to-clipboard; license is CC-BY-4.0 (attribution to RunLocalAI required).
<a href="https://runlocalai.co/benchmarks/337" rel="noopener">RunLocalAI: Qwen 2.5 Coder 7B Instruct on NVIDIA GeForce RTX 3080 16GB (Mobile) — 79.4 tok/s</a>
Next recommended step
Got the same model + hardware? Run it and submit your numbers — successful reproductions lift this benchmark's confidence tier.