Extracting data from charts, graphs, plots, and infographics. Specialized capability for vision-language models — distinct from raw OCR.
ollama pull minicpm-v (~8 GB — strong multimodal model with chart understanding).chart.png.import ollama
with open("chart.png", "rb") as f:
img_data = f.read()
resp = ollama.chat(model="minicpm-v", messages=[{
"role": "user", "content": "Extract all data points from this chart. What is the X axis? Y axis? Give me the approximate values for each data series.",
"images": [img_data]
}])
print(resp["message"]["content"])
ollama pull qwen2.5-vl:7b — stronger chart understanding, 128K context, handles multi-chart documents.pip install deplot (Google's DePlot — chart-to-table model, SOTA for simple charts).Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs MiniCPM-V 8B at 5-10 seconds per chart. Qwen2-VL 7B at 5-15 seconds per chart. Handles bar charts, line plots, scatter plots, and simple pie charts with reasonable accuracy. Pair with Ryzen 5 5600 + 16 GB DDR4 + 512 GB NVMe. Total: ~$360-405. Chart reading is viable at this budget — the models run comfortably in 12 GB. For DePlot (pixel-to-table mapping without LLM overhead): it runs on CPU at 1-3 seconds per chart.
Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Qwen2-VL 72B at 10-20 seconds per chart (50 GB, needs quantized). The 72B variant handles complex charts (multi-axis, stacked bars, logarithmic scales) with near-human accuracy. For batch chart processing (100s of charts/day): the 7B-8B models at 2-5 seconds per chart via vLLM are the throughput play. Total: ~$1,800-2,200. Chart reading benefits more from model quality than GPU speed — a 72B model on 3090 is more accurate than a 7B model on 4090.
The mistake: Using a text-only LLM (with OCR pre-processing) to "read" charts by extracting axis labels and guessing values. Why it fails: Text-only LLMs can't see spatial relationships — they can't understand that the bar height maps to the Y-axis scale, or that a scatter plot's trend line means correlation. OCR extracts "Sales: 100, 200, 300" but can't tell which bar is which. The fix: Use a vision-language model (MiniCPM-V, Qwen2-VL, or DePlot). These models see the actual chart image and understand visual encodings (position, length, area, color). They correctly extract data because they process the chart as pixels, not text. If you need structured output (CSV/JSON), use DePlot — it's trained specifically for chart-to-table conversion.
Browse all tools for runtimes that fit this workload.
Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.
The errors most operators hit when running chart & graph reading locally. Each links to a diagnose+fix walkthrough.
Verify your specific hardware can handle chart & graph reading before committing money.