MoGraphGPT: Creating Interactive Scenes Using Modular LLM and Graphical Control

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLMs exhibit high error rates, poor editability, and complex visual integration when generating code for interactive scenes, primarily due to linear dialogue structures and the absence of graphical control. To address this, we propose an element-level modular LLM architecture: scenes are decomposed into independent visual elements, each synthesized by a dedicated sub-model; a central coordination module then composes multi-element interaction logic. Integrated with a graphical user interface and natural language input, our system automatically generates tunable sliders and enables real-time visual preview. This work introduces the novel paradigm of *graphics-driven code generation*, enabling fine-grained editing, immediate visual feedback, and zero-code development. A user study demonstrates that our approach significantly outperforms Cursor Composer on multi-element complex 2D interactive scenes, with measurable improvements in usability, controllability, and precision of parameter adjustment.

Technology Category

Application Category

📝 Abstract
Creating interactive scenes often involves complex programming tasks. Although large language models (LLMs) like ChatGPT can generate code from natural language, their output is often error-prone, particularly when scripting interactions among multiple elements. The linear conversational structure limits the editing of individual elements, and lacking graphical and precise control complicates visual integration. To address these issues, we integrate an element-level modularization technique that processes textual descriptions for individual elements through separate LLM modules, with a central module managing interactions among elements. This modular approach allows for refining each element independently. We design a graphical user interface, MoGraphGPT , which combines modular LLMs with enhanced graphical control to generate codes for 2D interactive scenes. It enables direct integration of graphical information and offers quick, precise control through automatically generated sliders. Our comparative evaluation against an AI coding tool, Cursor Composer, as the baseline system and a usability study show MoGraphGPT significantly improves easiness, controllability, and refinement in creating complex 2D interactive scenes with multiple visual elements in a coding-free manner.
Problem

Research questions and friction points this paper is trying to address.

Enhances interactive scene creation with modular LLMs.
Improves graphical control for 2D interactive scenes.
Facilitates coding-free complex visual element interactions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular LLM for element processing
Graphical user interface integration
Automated slider for precise control
🔎 Similar Papers
No similar papers found.