🤖 AI Summary
This work addresses the unreliability of natural language–to–executable scientific visualization pipelines in web environments, where failures often stem from missing stages, misused operators, or incorrect sequencing. To mitigate these issues, the authors propose a structure-aware retrieval-augmented generation (RAG) approach that retrieves vtk.js code examples aligned with the target pipeline’s structural schema and uses them as contextual guidance for large language models. This enables accurate module selection, parameter configuration, and execution ordering. The study introduces a novel metric—“correction cost”—to quantify the degree of human intervention required and implements an interactive analysis interface to facilitate human–AI collaborative evaluation. Experimental results demonstrate that the proposed method significantly enhances the executability and practical utility of generated pipelines while effectively reducing correction cost.
📝 Abstract
Scientific visualization pipelines encode domain-specific procedural knowledge with strict execution dependencies, making their construction sensitive to missing stages, incorrect operator usage, or improper ordering. Thus, generating executable scientific visualization pipelines from natural-language descriptions remains challenging for large language models, particularly in web-based environments where visualization authoring relies on explicit code-level pipeline assembly. In this work, we investigate the reliability of LLM-based scientific visualization pipeline generation, focusing on vtk.js as a representative web-based visualization library. We propose a structure-aware retrieval-augmented generation workflow that provides pipeline-aligned vtk.js code examples as contextual guidance, supporting correct module selection, parameter configuration, and execution order. We evaluate the proposed workflow across multiple multi-stage scientific visualization tasks and LLMs, measuring reliability in terms of pipeline executability and human correction effort. To this end, we introduce correction cost as metric for the amount of manual intervention required to obtain a valid pipeline. Our results show that structured, domain-specific context substantially improves pipeline executability and reduces correction cost. We additionally provide an interactive analysis interface to support human-in-the-loop inspection and systematic evaluation of generated visualization pipelines.