InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language models (VLMs) lack the capability to perform joint reasoning across multiple semantically related charts. Method: We introduce InterChart, the first diagnostic benchmark for multi-chart reasoning, comprising synthetic aligned chart sets and real-world chart pairs. It features a three-tiered task hierarchy—entity inference, trend correlation, and multi-step abstract reasoning—to systematically evaluate VLMs’ semantic integration across 2–3 topically or structurally related charts. We propose a hierarchical evaluation framework and a “decompose-distribute” chart information processing mechanism to explicitly model cross-chart reasoning paths. Contribution/Results: Experiments reveal that state-of-the-art open- and closed-source VLMs suffer significant performance degradation as chart complexity increases; visual decomposition notably improves reasoning accuracy. InterChart is the first benchmark to uncover systematic limitations of VLMs in collaborative multi-chart understanding, providing an interpretable, scalable diagnostic tool for complex multimodal visual reasoning.

Technology Category

Application Category

📝 Abstract
We introduce InterChart, a diagnostic benchmark that evaluates how well vision-language models (VLMs) reason across multiple related charts, a task central to real-world applications such as scientific reporting, financial analysis, and public policy dashboards. Unlike prior benchmarks focusing on isolated, visually uniform charts, InterChart challenges models with diverse question types ranging from entity inference and trend correlation to numerical estimation and abstract multi-step reasoning grounded in 2-3 thematically or structurally related charts. We organize the benchmark into three tiers of increasing difficulty: (1) factual reasoning over individual charts, (2) integrative analysis across synthetically aligned chart sets, and (3) semantic inference over visually complex, real-world chart pairs. Our evaluation of state-of-the-art open and closed-source VLMs reveals consistent and steep accuracy declines as chart complexity increases. We find that models perform better when we decompose multi-entity charts into simpler visual units, underscoring their struggles with cross-chart integration. By exposing these systematic limitations, InterChart provides a rigorous framework for advancing multimodal reasoning in complex, multi-visual environments.
Problem

Research questions and friction points this paper is trying to address.

Evaluating VLMs' reasoning across multiple related charts
Benchmarking cross-chart integration in visual question answering
Diagnosing limitations in multimodal reasoning with complex charts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diagnostic benchmark for multi-chart reasoning
Three-tier difficulty with synthetic and real charts
Decomposing complex charts into simpler visual units
🔎 Similar Papers
No similar papers found.