Do MLLMs Really Understand the Charts?

📅 2025-08-27
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) heavily rely on visual recognition for unlabeled chart understanding, lacking fundamental visual reasoning capabilities—such as numerical estimation—leading to frequent hallucinations and poor generalization. To address this, we introduce CRBench, the first benchmark dedicated to visual reasoning over charts, and propose ChartReasoner: a framework integrating visual grounding modeling, progressive numerical estimation, and instruction tuning to enable controllable, stepwise reasoning guidance. ChartReasoner achieves substantial reasoning enhancement using only lightweight 3B/7B models, significantly outperforming GPT-4o and Gemini-2.5-Flash on CRBench. It improves average performance across general chart understanding tasks by 12.7%, effectively mitigates hallucinations, and—crucially—systematically identifies and bridges a core deficiency in MLLMs’ chart-based visual reasoning capability for the first time.

Technology Category

Application Category

📝 Abstract
Although Multimodal Large Language Models (MLLMs) have demonstrated increasingly impressive performance in chart understanding, most of them exhibit alarming hallucinations and significant performance degradation when handling non-annotated charts. Therefore, a question arises: Do MLLMs really understand the charts? Since a human is capable of understanding charts and estimating the values by visual reasoning, we first carefully establish a comprehensive Chart Reasoning Benchmark CRBench to rigorously evaluate the visual reasoning abilities of MLLMs on non-annotated charts. We argue that MLLMs are primarily relying on recognition rather than reasoning to interpret the charts. To steer MLLMs to reasonable chart understanding, we propose ChartReasoner that mimics human behavior by grounding their estimation in chart understanding. Extensive results on the proposed CRBench show that ChartReasnoner-3B/7B achieves superior performance in chart reasoning, even compared to GPT-4o and Gemini-2.5-Flash. More importantly, ChartReasnoner also demonstrates the visual reasoning abilities in general chart comprehension on public benchmarks, leading to significant performance gains and enabling MLLMs to rationally understand the charts. The code and dataset will be publicly available upon publication.
Problem

Research questions and friction points this paper is trying to address.

Evaluates MLLMs' visual reasoning in chart understanding
Addresses hallucinations in non-annotated chart interpretation
Proposes a benchmark and model for visual reasoning enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing ChartVRBench benchmark for visual reasoning evaluation
Proposing ChartVR-3B/7B models with Visual Reasoning Reinforcement Finetuning
Demonstrating strong generalization across chart understanding benchmarks
🔎 Similar Papers
No similar papers found.
X
Xiao Zhang
Beijing University of Posts and Telecommunications
D
Dongyuan Li
Beijing University of Posts and Telecommunications
Liuyu Xiang
Liuyu Xiang
Beijing University of Posts and Telecommunications
Computer VisionReinforcement LearningLLM Agent
Y
Yao Zhang
AAITC, CTO Organization, Lenovo
C
Cheng Zhong
AAITC, CTO Organization, Lenovo
Z
Zhaofeng He
Beijing University of Posts and Telecommunications