🤖 AI Summary
Large language models (LLMs) generate data visualizations with inconsistent quality, necessitating labor-intensive human correction. Method: This paper introduces the first multimodal large language model (MLLM) framework specifically designed for visualization critique. It (1) constructs a high-quality, multimodal visualization critique dataset integrating human annotations, LLM-synthesized critiques, and fine-grained feedback on semantic correctness, visual effectiveness, and task alignment; and (2) trains a lightweight, open-source 7B-parameter MLLM optimized for holistic visualization assessment. Results: The model achieves competitive performance on visualization critique benchmarks—matching or exceeding larger open- and closed-source models—while enabling efficient, automated diagnosis and refinement of LLM-generated charts. It establishes a scalable, reproducible evaluation infrastructure for closing the visualization generation loop.
📝 Abstract
Data visualization generation using Large Language Models (LLMs) has shown promising results but often produces suboptimal visualizations that require human intervention for improvement. In this work, we introduce VIS-Shepherd, a specialized Multimodal Large Language Model (MLLM)-based critic to evaluate and provide feedback for LLM-generated data visualizations. At the core of our approach is a framework to construct a high-quality visualization critique dataset, where we collect human-created visualization instances, synthesize corresponding LLM-generated instances, and construct high-quality critiques. We conduct both model-based automatic evaluation and human preference studies to evaluate the effectiveness of our approach. Our experiments show that even small (7B parameters) open-source MLLM models achieve substantial performance gains by leveraging our high-quality visualization critique dataset, reaching levels comparable to much larger open-source or even proprietary models. Our work demonstrates significant potential for MLLM-based automated visualization critique and indicates promising directions for enhancing LLM-based data visualization generation. Our project page: https://github.com/bopan3/VIS-Shepherd.