🤖 AI Summary
To address novice users’ difficulties in ideation and execution during mixed-reality spatial design, this paper proposes a human–AI collaborative, progressive AI-assisted design framework. Methodologically, it integrates a large vision-language model (LVLM) to support multi-level abstraction speech command parsing, reversible interactive scene editing, and real-time visual updates. Its key contributions are: (i) the first progressive AI pipeline tailored for spatial design, enabling controllable generation—from coarse-grained layout to fine-grained refinement; and (ii) a reversible operation mechanism that preserves user agency and ensures creative continuity. User studies demonstrate that, compared to a no-AI baseline, the system significantly enhances creative expression diversity (+37%) and design efficiency (reducing task completion time by 42%).
📝 Abstract
Mixed reality platforms allow users to create virtual environments, yet novice users struggle with both ideation and execution in spatial design. While existing AI models can automatically generate scenes based on user prompts, the lack of interactive control limits users' ability to iteratively steer the output. In this paper, we present EchoLadder, a novel human-AI collaboration pipeline that leverages large vision-language model (LVLM) to support interactive scene modification in virtual reality. EchoLadder accepts users' verbal instructions at varied levels of abstraction and spatial specificity, generates concrete design suggestions throughout a progressive design process. The suggestions can be automatically applied, regenerated and retracted by users' toggle control.Our ablation study showed effectiveness of our pipeline components. Our user study found that, compared to baseline without showing suggestions, EchoLadder better supports user creativity in spatial design. It also contributes insights on users' progressive design strategies under AI assistance, providing design implications for future systems.