Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the visual literacy of GPT-4 and Gemini on chart understanding tasks, assessing their capacity to replace humans in objective, vision-based interpretation. Method: We employ a standardized benchmark derived from the 53-item Visual Literacy Assessment Test (VLAT), augmented with human-crafted misleading chart variants, multi-turn prompt-controlled experiments, and direct comparison against human baselines. Contribution/Results: This work establishes the first quantitative, reproducible evaluation framework for visual reasoning in mainstream large language models (LLMs). It reveals a critical deficiency: both models heavily rely on parametric prior knowledge rather than actual visual input—over 70% of errors stem from ignoring visual cues or misapplying internal knowledge. VLAT scores for both models fall significantly below those of average human participants, exposing fundamental limitations in generalizable data visualization reasoning. The study introduces the first benchmark framework and diagnostic paradigm for AI-assisted visualization assessment.

Technology Category

Application Category

📝 Abstract
In this paper, we assess the visualization literacy of two prominent Large Language Models (LLMs): OpenAI's Generative Pretrained Transformers (GPT), the backend of ChatGPT, and Google's Gemini, previously known as Bard, to establish benchmarks for assessing their visualization capabilities. While LLMs have shown promise in generating chart descriptions, captions, and design suggestions, their potential for evaluating visualizations remains under-explored. Collecting data from humans for evaluations has been a bottleneck for visualization research in terms of both time and money, and if LLMs were able to serve, even in some limited role, as evaluators, they could be a significant resource. To investigate the feasibility of using LLMs in the visualization evaluation process, we explore the extent to which LLMs possess visualization literacy -- a crucial factor for their effective utility in the field. We conducted a series of experiments using a modified 53-item Visualization Literacy Assessment Test (VLAT) for GPT-4 and Gemini. Our findings indicate that the LLMs we explored currently fail to achieve the same levels of visualization literacy when compared to data from the general public reported in VLAT, and LLMs heavily relied on their pre-existing knowledge to answer questions instead of utilizing the information provided by the visualization when answering questions.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Chart Data Understanding
Human Role Replacement
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-4
Gemini
Visual Information Processing
🔎 Similar Papers
No similar papers found.