Do Large Language Models Understand Data Visualization Principles?

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study systematically evaluates the ability of large language models (LLMs) and vision-language models (VLMs) to understand and apply data visualization principles, exploring a novel paradigm that leverages natural language as an alternative to traditional symbolic rules for visualization validation. Using a controlled dataset of approximately 2,000 annotated Vega-Lite charts alongside over 300 real-world visualizations—and ground-truth validation labels generated via Answer Set Programming—the work presents the first comprehensive assessment of model performance in detecting and correcting visualization violations. The findings reveal that models exhibit significantly stronger capabilities in correcting errors than in detecting them, highlighting their potential as flexible visualization editors, yet they still underperform symbolic solvers on tasks requiring complex visual perception.

Technology Category

Application Category

📝 Abstract

Data visualization principles, derived from decades of research in design and perception, ensure proper visual communication. While prior work has shown that large language models (LLMs) can generate charts or flag misleading figures, it remains unclear whether they and their vision-language counterparts (VLMs) can reason about and enforce visualization principles directly. Constraint based systems encode these principles as logical rules for precise automated checks, but translating them into formal specifications demands expert knowledge. This motivates leveraging LLMs and VLMs as principle checkers that can reason about visual design directly, bypassing the need for symbolic rule specification. In this paper, we present the first systematic evaluation of both LLMs and VLMs on their ability to reason about visualization principles, using hard verification ground truth derived from Answer Set Programming (ASP). We compiled a set of visualization principles expressed as natural-language statements and generated a controlled dataset of approximately 2,000 Vega-Lite specifications annotated with explicit principle violations, complemented by over 300 real-world Vega-Lite charts. We evaluated both checking and fixing tasks, assessing how well models detect principle violations and correct flawed chart specifications. Our work highlights both the promise of large (vision-)language models as flexible validators and editors of visualization designs and the persistent gap with symbolic solvers on more nuanced aspects of visual perception. They also reveal an interesting asymmetry: frontier models tend to be more effective at correcting violations than at detecting them reliably.

Problem

Research questions and friction points this paper is trying to address.

large language models

data visualization principles

vision-language models

principle violation detection

chart correction

Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models

visualization principles

vision-language models