RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
  1. >
  2. Home
  3. /Models
  4. /InternVL 2.5 26B
other
26B parameters
Commercial OK
Multimodal
·Reviewed May 2026

InternVL 2.5 26B

InternVL 2.5 mid-tier — Shanghai AI Lab vision-language model with strong document and chart understanding.

License: MIT·Released Dec 5, 2024·Context: 32,768 tokens

Overview

InternVL 2.5 mid-tier — Shanghai AI Lab vision-language model with strong document and chart understanding.

How to run it

InternVL 2.5 26B is OpenGVLab's 26B vision-language model — the smaller sibling of InternVL 2.5 78B. 26B text backbone + InternViT vision encoder, designed for document understanding, OCR, and visual QA. Run at Q4_K_M via llama.cpp with llava-server for vision. Q4_K_M file size 15 GB (text) + ~3-5 GB (vision). Minimum VRAM: 16 GB — RTX 4080 (16GB) at Q4_K_M text-only, or Q3_K_M + vision. Recommended: RTX 4090 24GB at Q4_K_M + vision. Throughput: ~30-50 tok/s on RTX 4090 at Q4_K_M text-only; vision encoding adds ~1-3s per image. InternVL architecture — InternViT encoder is large (6B), making vision VRAM proportionally higher than Llama/Qwen vision models at the same text backbone size. Check llama.cpp InternVL 26B support — may differ from 78B support. Use for: document OCR, chart understanding, visual QA, UI screenshot analysis. Not for: text-only general chat (use standard 26B text model). Context: 32K advertised; practical with vision at Q4 on 24 GB is 4-8K. For larger vision models: InternVL 2.5 78B.

Hardware guidance

Minimum: RTX 3060 12GB at Q3_K_M + vision (tight). Recommended: RTX 4090 24GB at Q4_K_M + vision (8K context). VRAM math: 26B text at Q4 ≈ 15 GB. InternViT encoder: ~4-6 GB. KV cache at 8K: ~5 GB. Total with vision: ~24-26 GB. RTX 4090 24GB: Q4 + vision + 4K context — tight. Offload vision encoder activations for headroom. RTX 4080 16GB: Q3_K_M + vision at 4K. MacBook Pro M4 Max 36GB+: Q4 + vision at 5-10 tok/s. Cloud: A10 24GB at Q4_K_M + vision. InternViT is the bottleneck — budget 4-6 GB specifically for the vision encoder. AWQ-INT4 drops text to ~13 GB, helping VRAM fit.

What breaks first

  1. InternViT VRAM domination. The vision encoder is proportionally larger than the text backbone. At 26B text, the 6B vision encoder takes 25-30% of total VRAM — much higher ratio than Llama/Qwen vision models. 2. Multimodal GGUF scarcity. Pre-converted InternVL 26B GGUFs with vision are rare. You may need to convert from hf or use text-only. 3. Resolution sensitivity. InternViT's quality degrades sharply with low-resolution inputs. But high-res inputs spike vision encoder VRAM by 3-5 GB. Find the resolution sweet spot for your use case. 4. Tokenizer format. InternVL uses a custom vision+text tokenizer format. Standard llama.cpp llava may not handle InternVL's specific multimodal token embedding correctly. Validate vision outputs against reference.

Runtime recommendation

llama.cpp with InternVL-compatible llava-server. Verify InternVL support in your build. OpenGVLab's reference code as fallback. vLLM if InternVL is registered. Avoid Ollama unless InternVL vision tag exists.

Common beginner mistakes

Mistake: Expecting InternVL 26B to have the same vision-to-text VRAM ratio as Llama 3.2 Vision. Fix: InternViT is ~6B — 20× larger than CLIP. Budget 4-6 GB for vision encoder alone. Your 16 GB GPU may not fit vision+text at Q4. Mistake: Using InternVL 26B vision projector with 78B GGUF. Fix: Different model sizes, different projectors. Match models exactly. Mistake: Assuming 26B = half the quality of 78B. Fix: The 26B is significantly weaker at complex visual reasoning. 78B is the recommendation for document understanding and OCR. 26B is the budget option. Mistake: Sending images without preprocessing. Fix: InternVL expects specific image preprocessing. Use the model's image processor or resize to the encoder's expected input size.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (internvl-2.5)
InternVL 2.5 26B26B
You are here
InternVL 2.5 78B78B
Datacenter
Distilled / fine-tuned from this
InternVL 2.5 78B78B
Datacenter

Strengths

  • MIT license
  • Strong on charts and documents

Weaknesses

  • Smaller community than Qwen-VL

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M16.0 GB20 GB

Get the model

HuggingFace

Original weights

huggingface.co/OpenGVLab/InternVL2_5-26B

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of InternVL 2.5 26B.

NVIDIA GB200 NVL72
13824GB · nvidia
AMD Instinct MI355X
288GB · amd
AMD Instinct MI325X
256GB · amd
AMD Instinct MI300X
192GB · amd
NVIDIA B200
192GB · nvidia
NVIDIA H100 NVL
188GB · nvidia
NVIDIA H200
141GB · nvidia
Intel Gaudi 3
128GB · intel

Frequently asked

What's the minimum VRAM to run InternVL 2.5 26B?

20GB of VRAM is enough to run InternVL 2.5 26B at the Q4_K_M quantization (file size 16.0 GB). Higher-quality quantizations need more.

Can I use InternVL 2.5 26B commercially?

Yes — InternVL 2.5 26B ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of InternVL 2.5 26B?

InternVL 2.5 26B supports a context window of 32,768 tokens (about 33K).

Does InternVL 2.5 26B support images?

Yes — InternVL 2.5 26B is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Source: huggingface.co/OpenGVLab/InternVL2_5-26B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware
  • 4060 Ti 16 GB vs 4070 Ti Super →
  • Arc B580 vs 4060 Ti 16 GB →
Buyer guides
  • Best GPU for Ollama — 13-32B daily inference →
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →
Recommended hardware
  • NVIDIA GB200 NVL72 →
  • AMD Instinct MI355X →
  • AMD Instinct MI325X →
  • AMD Instinct MI300X →
  • NVIDIA B200 →
Alternatives
InternVL 2.5 78B
Before you buy

Verify InternVL 2.5 26B runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier
Models in the same parameter band as this one
  • Qwen 3 30B-A3B
    qwen · 30B
    unrated
  • Gemma 4 31B Dense
    gemma · 31B
    unrated
  • Nemotron 3 Nano (30B-A3B)
    other · 30B
    unrated
  • DeepSeek Coder V3
    deepseek · 33B
    unrated
Step up
More capable — bigger memory footprint
  • Llama 3.3 70B Instruct
    llama · 70B
    9.1/10
  • DeepSeek R1 Distill Llama 70B
    deepseek · 70B
    9.0/10
Step down
Smaller — faster, runs on weaker hardware
  • DeepSeek V3 Lite (16B MoE)
    deepseek · 16B
    unrated
  • Mistral Small 3 24B
    mistral · 24B
    8.4/10