Dynamic Worlds, Dynamic Humans: Generating Virtual Human-Scene Interaction Motion in Dynamic Scenes

📅 2026-01-27
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing virtual human–scene interaction methods, which typically assume static environments and thus struggle in real-world dynamic settings. We propose Dyn-HSI, the first cognitive architecture for human–scene interaction generation tailored to dynamic scenes, integrating visual perception, memory mechanisms, and action control to enable continuous environmental awareness, experience reuse, and high-quality motion synthesis. Key innovations include dynamic scene-aware navigation, a hierarchical experience memory module, and a multimodal conditional diffusion model. We also introduce Dyn-Scenes, the first benchmark dataset for dynamic human–scene interactions. Experiments demonstrate that our approach significantly outperforms current methods in both static and dynamic scenarios, generating motions that exhibit high fidelity and strong contextual awareness, thereby validating its generalization capability and motion quality.

Technology Category

Application Category

📝 Abstract
Scenes are continuously undergoing dynamic changes in the real world. However, existing human-scene interaction generation methods typically treat the scene as static, which deviates from reality. Inspired by world models, we introduce Dyn-HSI, the first cognitive architecture for dynamic human-scene interaction, which endows virtual humans with three humanoid components. (1)Vision (human eyes): we equip the virtual human with a Dynamic Scene-Aware Navigation, which continuously perceives changes in the surrounding environment and adaptively predicts the next waypoint. (2)Memory (human brain): we equip the virtual human with a Hierarchical Experience Memory, which stores and updates experiential data accumulated during training. This allows the model to leverage prior knowledge during inference for context-aware motion priming, thereby enhancing both motion quality and generalization. (3) Control (human body): we equip the virtual human with Human-Scene Interaction Diffusion Model, which generates high-fidelity interaction motions conditioned on multimodal inputs. To evaluate performance in dynamic scenes, we extend the existing static human-scene interaction datasets to construct a dynamic benchmark, Dyn-Scenes. We conduct extensive qualitative and quantitative experiments to validate Dyn-HSI, showing that our method consistently outperforms existing approaches and generates high-quality human-scene interaction motions in both static and dynamic settings.
Problem

Research questions and friction points this paper is trying to address.

dynamic scenes
human-scene interaction
virtual human
motion generation
scene dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Human-Scene Interaction
World Models
Diffusion Motion Generation
Hierarchical Memory
Scene-Aware Navigation
🔎 Similar Papers
No similar papers found.