Is this chart lying to me? Automating the detection of misleading visualizations

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Misleading data visualizations exacerbate misinformation dissemination, yet large-scale, diverse, and publicly available benchmark datasets for AI-based detection remain scarce. To address this gap, we introduce Misviz—the first comprehensive, real-chart benchmark for misleading visualizations—alongside Misviz-synth, a synthetically generated counterpart using Matplotlib, covering multiple design violations (e.g., axis truncation, inappropriate scaling, deceptive annotations). Methodologically, we propose a hybrid detection framework integrating multimodal large language models, rule-based analytical heuristics, and fine-tuned classifiers to enable fine-grained violation identification. Extensive experiments reveal that state-of-the-art models achieve limited accuracy on Misviz, underscoring the task’s inherent difficulty. All resources—including datasets, source code, and standardized evaluation protocols—are fully open-sourced. This work establishes a reproducible benchmark, introduces novel data assets, and provides a systematic evaluation framework for visualization integrity assessment, thereby advancing AI-driven governance of information credibility.

Technology Category

Application Category

📝 Abstract
Misleading visualizations are a potent driver of misinformation on social media and the web. By violating chart design principles, they distort data and lead readers to draw inaccurate conclusions. Prior work has shown that both humans and multimodal large language models (MLLMs) are frequently deceived by such visualizations. Automatically detecting misleading visualizations and identifying the specific design rules they violate could help protect readers and reduce the spread of misinformation. However, the training and evaluation of AI models has been limited by the absence of large, diverse, and openly available datasets. In this work, we introduce Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders. To support model training, we also release Misviz-synth, a synthetic dataset of 81,814 visualizations generated using Matplotlib and based on real-world data tables. We perform a comprehensive evaluation on both datasets using state-of-the-art MLLMs, rule-based systems, and fine-tuned classifiers. Our results reveal that the task remains highly challenging. We release Misviz, Misviz-synth, and the accompanying code.
Problem

Research questions and friction points this paper is trying to address.

Detecting misleading visualizations in social media
Identifying violated chart design principles automatically
Addressing limited datasets for AI model training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing Misviz benchmark with annotated visualizations
Releasing synthetic dataset Misviz-synth for model training
Evaluating MLLMs rule-based systems and fine-tuned classifiers
🔎 Similar Papers
No similar papers found.