AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited flexibility and controllability in interactive world modeling by proposing a framework centered on 3D human motion as the primary interaction modality. It enhances egocentric spatial perception through supervision from an external viewpoint and introduces a unified coordinate system that jointly leverages anchor views and textual descriptions to drive dynamic, customizable evolution of local scenes. The method significantly outperforms state-of-the-art approaches in both spatiotemporal geometric consistency and adherence to text-guided scene evolution, thereby improving the completeness and controllability of interactive modeling.

📝 Abstract

Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile controllability required by practical scenarios. To bridge this gap, we present AnchorWorld, a framework that advances egocentric simulation through enhanced interaction integrity and a flexible mechanism for world customization. First, we utilize 3D human motion as the primary interaction modality. To complement the out-of-view or truncated body parts in egocentric views, we introduce an auxiliary training supervision that incorporates exogenous viewpoints decoupled from the agent's first-person sensorium. It allows the model to observe the agent's full-body positioning relative to the environment, facilitating a more robust spatial grounding of human-world interactions. Furthermore, we propose a simple yet effective mechanism for customizing self-evolving worlds. This is achieved by defining anchor views within a unified world coordinate system, coupled with textual descriptions dictating the dynamic evolution of local scenes. Experimental results show that AnchorWorld significantly outperforms state-of-the-art baselines, while ablation studies validate the effectiveness of our key designs. Notably, our customization scheme exhibits promising spatio-temporal geometric consistency and adheres strictly to the prescribed evolutionary dynamics.

Problem

Research questions and friction points this paper is trying to address.

interactive world modeling

egocentric simulation

world customization

spatial grounding

dynamic evolution

Innovation

Methods, ideas, or system contributions that make the work stand out.

egocentric simulation

view-based customization

anchor views