Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement

📅 2026-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic integration between visual and textual information in existing chart understanding approaches, as well as the prevalent limitations in perception and reasoning capabilities of multimodal large language models (MLLMs). It presents a comprehensive survey of MLLM advancements in chart understanding, establishing a unified taxonomy encompassing tasks, datasets, and methodologies. Furthermore, the work proposes a cognition-enhanced learning pathway grounded in alignment optimization and reinforcement learning. By synthesizing multimodal fusion strategies—from classical deep learning to cutting-edge MLLMs—across visual–language alignment, reasoning architectures, and cognitive modeling, this research offers a structured framework for the field, clarifies current limitations, and outlines promising directions for enhancing model robustness and reliability in chart understanding.

Technology Category

Application Category

📝 Abstract
Chart understanding is a quintessential information fusion task, requiring the seamless integration of graphical and textual data to extract meaning. The advent of Multimodal Large Language Models (MLLMs) has revolutionized this domain, yet the landscape of MLLM-based chart analysis remains fragmented and lacks systematic organization. This survey provides a comprehensive roadmap of this nascent frontier by structuring the domain's core components. We begin by analyzing the fundamental challenges of fusing visual and linguistic information in charts. We then categorize downstream tasks and datasets, introducing a novel taxonomy of canonical and non-canonical benchmarks to highlight the field's expanding scope. Subsequently, we present a comprehensive evolution of methodologies, tracing the progression from classic deep learning techniques to state-of-the-art MLLM paradigms that leverage sophisticated fusion strategies. By critically examining the limitations of current models, particularly their perceptual and reasoning deficits, we identify promising future directions, including advanced alignment techniques and reinforcement learning for cognitive enhancement. This survey aims to equip researchers and practitioners with a structured understanding of how MLLMs are transforming chart information fusion and to catalyze progress toward more robust and reliable systems.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models
Chart Understanding
Information Fusion
Cognitive Enhancement
Perceptual and Reasoning Deficits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Models
Chart Understanding
Information Fusion
Cognitive Enhancement
Benchmark Taxonomy
🔎 Similar Papers
No similar papers found.
Z
Zhihang Yi
College of Computer Science, Sichuan University, Chengdu, 610065, Sichuan, P.R. China; Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Chengdu, 610065, Sichuan, P.R. China
Jian Zhao
Jian Zhao
Zhongguancun Institute of Artificial Intelligence
Reinforcement LearningMulti-Agent System
Jiancheng Lv
Jiancheng Lv
University of Science and Technology of China
Operations ManagementMarketing
Tao Wang
Tao Wang
Sichuan University
Medical imagingCT reconstruction