Qwen 2.5 Coder 7B Instruct on NVIDIA GeForce RTX 3080 16GB (Mobile)

Measured this month.

Measurement

tok/s: 79.4
TTFT: —
VRAM used: —
RAM used: —
Power: —
Quant: Q4_K_M
Context: 8K
Run date: 2026-05-10
Source: owner

Editorial notes

First real-rig benchmark — Lenovo Legion 7 with mobile RTX 3080 16GB. Three runs at the same standardized prompt (~70 input tokens, ~250-360 output tokens of TypeScript code generation). Run 1 was cold (10.45s model load); runs 2-3 hot.

— UPDATE 2026-05-10 (V36.29): Added cold-start data point. The 79.38 tok/s figure above is the warm-run median (3 back-to-back runs, GPU at full boost clock 1770 MHz). Two separate cold-start single-runs on the same rig (AC plugged in, Windows Balanced profile, GPU starting at idle clock 330-435 MHz) measured 71.72 and 69.06 tok/s, median 70.39 tok/s. The cold-vs-warm delta is the GPU clock ramp from idle to boost, NOT thermal throttling (GPU temp 66-72°C across runs).

Why this confidence tier?

High confidence

Confidence is rule-based. Every factor below contributed to the tier. We never expose a single numeric score; the tier label is auditable through this explanation alone.

Factors

+Measured by RunLocalAI editorial

How to improve this benchmark's confidence

Reproduce this benchmark →An independent reproduction with matching numbers lifts the tier and reduces single-source risk.
Read the confidence methodology →Full editorial standards for tiering.
Why we don't use percentages →Tier labels — auditable, no opaque score.

Cohort intelligence

How this measurement compares to the rest of the corpus. Only comparable rows (same model + hardware first, with relaxations labelled) are used. We never average across runtimes or quant formats unless explicitly told to.

Insufficient comparison data. Insufficient cohort (0 comparable measurements). Outlier detection requires ≥5.

No comparable measurements in the public corpus yet. Submitting a reproduction or measurement on adjacent hardware would unlock the intelligence panel for this row.

Reproduce this benchmark

Got the same model + hardware combo? Run the same measurement and submit your numbers. We'll pre-fill model, hardware, quant, and context — you just add your tok/s, VRAM, runtime version. If your numbers match within ±15%, this benchmark gets a confidence lift and a reproduction badge.

Reproduce this benchmark →

Drill into the entity pages for this measurement.

Qwen 2.5 Coder 7B Instruct model page

NVIDIA GeForce RTX 3080 16GB (Mobile) hardware page

All measurements for this exact pair

Try NVIDIA GeForce RTX 3080 16GB (Mobile) in the build engine

Cite or export

Reference this benchmark in your work. Multiple formats; CC-BY attribution required.

Cite this benchmark or paste it into a README. Copy-to-clipboard; license is CC-BY-4.0 (attribution to RunLocalAI required).

OG card (PNG)

1200x630, social-preview ready

Download SVG

vector card, scales cleanly

Embed this benchmark

Paste into a Reddit thread, blog post, or README — attribution baked in.

<a href="https://runlocalai.co/benchmarks/337" rel="noopener">RunLocalAI: Qwen 2.5 Coder 7B Instruct on NVIDIA GeForce RTX 3080 16GB (Mobile) — 79.4 tok/s</a>

Direct download: .json · .md · .bib · .svg

Next recommended step

Got the same model + hardware? Run it and submit your numbers — successful reproductions lift this benchmark's confidence tier.

Reproduce this benchmark

OrCompare other measurements for Qwen 2.5 Coder 7B Instruct on NVIDIA GeForce RTX 3080 16GB (Mobile)See the benchmark roadmap

Measurement

Why this confidence tier?

Cohort intelligence

Reproduce this benchmark

Related

Cite or export

Next recommended step