WATCH: World-aware Allied Trajectory and pose reconstruction for Camera and Human

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address depth ambiguity, motion ambiguity, and camera–human motion coupling in global human motion reconstruction from in-the-wild monocular video, this paper proposes an end-to-end differentiable framework that jointly optimizes human pose and camera motion. Methodologically, it integrates geometric priors with sequential modeling to ensure physical plausibility and temporal coherence. Key contributions include: (1) an analytical heading-angle decomposition technique that explicitly decouples human orientation from camera rotation; and (2) a world-model-inspired camera trajectory fusion mechanism—first of its kind—to efficiently and geometrically consistently incorporate camera translation estimates. Evaluated on multiple in-the-wild benchmarks, the method achieves state-of-the-art end-to-end trajectory reconstruction performance, significantly improving accuracy, physical plausibility, and generalizability of both 3D poses and camera trajectories in the world coordinate system.

Technology Category

Application Category

📝 Abstract
Global human motion reconstruction from in-the-wild monocular videos is increasingly demanded across VR, graphics, and robotics applications, yet requires accurate mapping of human poses from camera to world coordinates-a task challenged by depth ambiguity, motion ambiguity, and the entanglement between camera and human movements. While human-motion-centric approaches excel in preserving motion details and physical plausibility, they suffer from two critical limitations: insufficient exploitation of camera orientation information and ineffective integration of camera translation cues. We present WATCH (World-aware Allied Trajectory and pose reconstruction for Camera and Human), a unified framework addressing both challenges. Our approach introduces an analytical heading angle decomposition technique that offers superior efficiency and extensibility compared to existing geometric methods. Additionally, we design a camera trajectory integration mechanism inspired by world models, providing an effective pathway for leveraging camera translation information beyond naive hard-decoding approaches. Through experiments on in-the-wild benchmarks, WATCH achieves state-of-the-art performance in end-to-end trajectory reconstruction. Our work demonstrates the effectiveness of jointly modeling camera-human motion relationships and offers new insights for addressing the long-standing challenge of camera translation integration in global human motion reconstruction. The code will be available publicly.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing global human motion from monocular videos
Addressing depth and motion ambiguity in pose estimation
Integrating camera translation with human movement reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analytical heading angle decomposition technique
Camera trajectory integration mechanism inspired
Jointly modeling camera-human motion relationships
🔎 Similar Papers
No similar papers found.