Do Large Language Models Understand Data Visualization Rules?

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) can comprehend and apply design and perceptual rules from visualization research as a substitute for traditional symbolic constraint systems. We introduce the first evaluation framework grounded in Answer Set Programming (ASP)-based hard verification criteria, translating Draco’s logical constraints into natural language and curating a dataset of 2,000 annotated Vega-Lite charts. Experiments show that state-of-the-art LLMs achieve an F1 score of 0.82 in detecting common rule violations and exhibit over 98% compliance in structured output generation; however, they perform poorly on subtle perceptual rules (F1 < 0.15). Notably, expressing constraints in natural language substantially boosts model performance—especially for smaller models, with gains up to 150%—highlighting the critical role of linguistic formulation in enabling LLMs to interpret visualization principles.

Technology Category

Application Category

📝 Abstract

Data visualization rules-derived from decades of research in design and perception-ensure trustworthy chart communication. While prior work has shown that large language models (LLMs) can generate charts or flag misleading figures, it remains unclear whether they can reason about and enforce visualization rules directly. Constraint-based systems such as Draco encode these rules as logical constraints for precise automated checks, but maintaining symbolic encodings requires expert effort, motivating the use of LLMs as flexible rule validators. In this paper, we present the first systematic evaluation of LLMs against visualization rules using hard-verification ground truth derived from Answer Set Programming (ASP). We translated a subset of Draco's constraints into natural-language statements and generated a controlled dataset of 2,000 Vega-Lite specifications annotated with explicit rule violations. LLMs were evaluated on both accuracy in detecting violations and prompt adherence, which measures whether outputs follow the required structured format. Results show that frontier models achieve high adherence (Gemma 3 4B / 27B: 100%, GPT-oss 20B: 98%) and reliably detect common violations (F1 up to 0.82),yet performance drops for subtler perceptual rules (F1 < 0.15 for some categories) and for outputs generated from technical ASP formulations.Translating constraints into natural language improved performance by up to 150% for smaller models. These findings demonstrate the potential of LLMs as flexible, language-driven validators while highlighting their current limitations compared to symbolic solvers.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Data Visualization Rules

Rule Violation Detection

Constraint-based Systems

Natural Language Understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models

data visualization rules

constraint validation