🤖 AI Summary
This study systematically evaluates the chart comprehension capabilities of multimodal large language models (MLLMs), focusing on how visual elements—such as color, shape, and textual annotations—affect model readability, with comparative analysis against human perception.
Method: We introduce VLAT-ex, a novel, rigorously constructed benchmark comprising 380 diverse data visualization instances. Using multivariate controlled experiments and statistical analysis, we quantitatively measure MLLM response accuracy and omission rates across systematic variations in chart type, title format, and color scheme.
Contribution/Results: Our analysis reveals that chart type (e.g., line and bar charts yield higher performance) and title presentation significantly impact MLLM performance, whereas color scheme exhibits negligible influence. Based on these findings, we propose empirically grounded visualization design principles tailored for MLLMs. To foster reproducibility and standardization, we publicly release both the VLAT-ex dataset and associated evaluation toolkit, advancing rigorous, comparable assessment of MLLM visualization understanding.
📝 Abstract
Multimodal Large Language Models (MLLMs) can interpret data visualizations, but what makes a visualization understandable to these models? Do factors like color, shape, and text influence legibility, and how does this compare to human perception? In this paper, we build on prior work to systematically assess which visualization characteristics impact MLLM interpretability. We expanded the Visualization Literacy Assessment Test (VLAT) test set from 12 to 380 visualizations by varying plot types, colors, and titles. This allowed us to statistically analyze how these features affect model performance. Our findings suggest that while color palettes have no significant impact on accuracy, plot types and the type of title significantly affect MLLM performance. We observe similar trends for model omissions. Based on these insights, we look into which plot types are beneficial for MLLMs in different tasks and propose visualization design principles that enhance MLLM readability. Additionally, we make the extended VLAT test set, VLAT ex, publicly available on https://osf.io/ermwx/ together with our supplemental material for future model testing and evaluation.