🤖 AI Summary
Multimodal large language models (MLLMs) frequently generate hallucinations in chart understanding due to visual–textual misalignment. To address this, we propose the first chart-specific, fine-grained posterior visual attribution framework that verifies response credibility by localizing the underlying chart elements—such as axes, data points, and legend entries—that substantiate textual answers. Our method integrates chart instance segmentation, set-of-marks prompting, and synthetic data augmentation. We further introduce ChartVA-Eval, the first cross-domain benchmark featuring fine-grained, element-level human annotations for visual attribution evaluation. Experiments demonstrate that our framework improves fine-grained attribution accuracy by 26–66% on ChartVA-Eval, significantly enhancing the interpretability and reliability of chart understanding outputs. This work establishes a novel paradigm for hallucination detection and trustworthy reasoning in MLLMs applied to chart comprehension.
📝 Abstract
The growing capabilities of multimodal large language models (MLLMs) have advanced tasks like chart understanding. However, these models often suffer from hallucinations, where generated text sequences conflict with the provided visual data. To address this, we introduce Post-Hoc Visual Attribution for Charts, which identifies fine-grained chart elements that validate a given chart-associated response. We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects and employs set-of-marks prompting with MLLMs for fine-grained visual attribution. Additionally, we present ChartVA-Eval, a benchmark with synthetic and real-world charts from diverse domains like finance, policy, and economics, featuring fine-grained attribution annotations. Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.