Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering

📅 2025-03-23

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses a critical limitation of multimodal large language models (MLLMs): their inability to detect and reason about misleading charts. We introduce Misleading ChartQA, the first large-scale benchmark (3,000+ samples) covering 21 types of deceptive visual techniques across 10 chart genres, enabling the first systematic evaluation of 16 state-of-the-art MLLMs on misleading chart comprehension. To tackle this challenge, we propose a novel multi-stage reasoning pipeline that jointly models chart code and CSV data, incorporates iterative MLLM self-verification, and integrates expert human annotation for precise localization of misleading elements. Experimental results reveal that current MLLMs exhibit severe deficiencies in misleading-chart detection; our pipeline achieves an average accuracy improvement of 27.4% over baseline methods. The Misleading ChartQA benchmark dataset and implementation code are publicly released to foster future research.

Technology Category

Application Category

📝 Abstract

Misleading chart visualizations, which intentionally manipulate data representations to support specific claims, can distort perceptions and lead to incorrect conclusions. Despite decades of research, misleading visualizations remain a widespread and pressing issue. Recent advances in multimodal large language models (MLLMs) have demonstrated strong chart comprehension capabilities, yet no existing work has systematically evaluated their ability to detect and interpret misleading charts. This paper introduces the Misleading Chart Question Answering (Misleading ChartQA) Benchmark, a large-scale multimodal dataset designed to assess MLLMs in identifying and reasoning about misleading charts. It contains over 3,000 curated examples, covering 21 types of misleaders and 10 chart types. Each example includes standardized chart code, CSV data, and multiple-choice questions with labeled explanations, validated through multi-round MLLM checks and exhausted expert human review. We benchmark 16 state-of-the-art MLLMs on our dataset, revealing their limitations in identifying visually deceptive practices. We also propose a novel pipeline that detects and localizes misleaders, enhancing MLLMs' accuracy in misleading chart interpretation. Our work establishes a foundation for advancing MLLM-driven misleading chart comprehension. We publicly release the sample dataset to support further research in this critical area.

Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs' ability to detect misleading charts

Creating a benchmark for misleading chart interpretation

Improving MLLMs' accuracy in deceptive chart analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Misleading ChartQA Benchmark dataset

Proposes pipeline detecting and localizing misleaders

Benchmarks 16 MLLMs on deceptive charts

🔎 Similar Papers

CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models