Chart Question Answering from Real-World Analytical Narratives

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the low ecological validity of existing chart question answering (CQA) benchmarks by introducing VizNoteQA—the first CQA dataset derived from real-world visualization notebooks. Anchored in analytical narratives, VizNoteQA jointly extracts multimodal charts and natural language questions, faithfully coupling visual presentation with complex, multi-step reasoning to enhance task authenticity and difficulty. Methodologically, we propose a notebook-structure-guided multimodal alignment strategy that integrates chart understanding and NLP techniques to establish fine-grained semantic correspondences between visual elements and narrative text. Evaluation on VizNoteQA reveals that state-of-the-art multimodal large language models (e.g., GPT-4.1) achieve only 69.3% accuracy, exposing systematic limitations in reasoning coherence, cross-view integration, and narrative-driven comprehension under realistic analytical settings. This work establishes a new ecologically grounded benchmark and methodological paradigm for chart understanding research.

Technology Category

Application Category

📝 Abstract

We present a new dataset for chart question answering (CQA) constructed from visualization notebooks. The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives. Unlike prior benchmarks, our data reflects ecologically valid reasoning workflows. Benchmarking state-of-the-art multimodal large language models reveals a significant performance gap, with GPT-4.1 achieving an accuracy of 69.3%, underscoring the challenges posed by this more authentic CQA setting.

Problem

Research questions and friction points this paper is trying to address.

Chart question answering from real-world analytical narratives

Multi-view charts with natural language questions

Performance gap in authentic CQA setting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset from real-world visualization notebooks

Multi-view charts with natural language questions

Benchmarking multimodal large language models

🔎 Similar Papers

VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning

2024-09-03AAAI Conference on Artificial IntelligenceCitations: 0

Authors to Follow