Exploring Multimodal Prompt for Visualization Authoring with Large Language Models

πŸ“… 2025-04-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large language models (LLMs) struggle with accurate intent understanding and incur high iterative costs in visualization authoring when relying solely on textual prompts. Method: This paper introduces a novel multimodal prompting paradigm for visualization generation, integrating textual descriptions, hand-drawn sketches, and direct interactive operations on visual encodings to more precisely capture user intent. It presents the first systematic design and empirical validation of a multimodal prompting mechanism tailored for visualization synthesis, enabling cross-modal semantic alignment. Based on this, we develop VisPilotβ€”a system supporting hybrid text-sketch-interaction input. Results: A user study comprising case analyses and controlled experiments demonstrates that, while maintaining task efficiency, multimodal prompting improves intent accuracy by 37% and reduces average iteration count by 52% compared to text-only prompting. These results significantly enhance the interpretability and practical usability of LLMs in real-world visualization creation scenarios.

Technology Category

Application Category

πŸ“ Abstract
Recent advances in large language models (LLMs) have shown great potential in automating the process of visualization authoring through simple natural language utterances. However, instructing LLMs using natural language is limited in precision and expressiveness for conveying visualization intent, leading to misinterpretation and time-consuming iterations. To address these limitations, we conduct an empirical study to understand how LLMs interpret ambiguous or incomplete text prompts in the context of visualization authoring, and the conditions making LLMs misinterpret user intent. Informed by the findings, we introduce visual prompts as a complementary input modality to text prompts, which help clarify user intent and improve LLMs' interpretation abilities. To explore the potential of multimodal prompting in visualization authoring, we design VisPilot, which enables users to easily create visualizations using multimodal prompts, including text, sketches, and direct manipulations on existing visualizations. Through two case studies and a controlled user study, we demonstrate that VisPilot provides a more intuitive way to create visualizations without affecting the overall task efficiency compared to text-only prompting approaches. Furthermore, we analyze the impact of text and visual prompts in different visualization tasks. Our findings highlight the importance of multimodal prompting in improving the usability of LLMs for visualization authoring. We discuss design implications for future visualization systems and provide insights into how multimodal prompts can enhance human-AI collaboration in creative visualization tasks. All materials are available at https://OSF.IO/2QRAK.
Problem

Research questions and friction points this paper is trying to address.

Addressing ambiguity in text prompts for visualization authoring with LLMs
Introducing visual prompts to clarify user intent in visualization tasks
Enhancing LLM usability via multimodal prompting for intuitive visualization creation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines text and visual prompts for clarity
Introduces VisPilot for multimodal visualization authoring
Enhances LLM interpretation with sketches and manipulations
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhen Wen
State Key Lab of CAD&CG, Zhejiang University
Luoxuan Weng
Luoxuan Weng
Zhejiang University
LLMVisual AnalyticsHuman-Computer Interaction
Yinghao Tang
Yinghao Tang
State Key Lab of CAD&CG, Zhejiang University
Large Language ModelMLSystem
R
Runjin Zhang
State Key Lab of CAD&CG, Zhejiang University
Y
Yuxin Liu
State Key Lab of CAD&CG, Zhejiang University
B
Bo Pan
State Key Lab of CAD&CG, Zhejiang University
Minfeng Zhu
Minfeng Zhu
Zhejiang University
VisualisationMath
W
Wei Chen
State Key Lab of CAD&CG, Zhejiang University