Physics-based Human Pose Estimation from a Single Moving RGB Camera

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing physics-based monocular human pose tracking methods suffer from artifacts under non-planar ground or camera motion and often rely on synthetic data lacking realistic geometry, lighting, and motion modeling—leading to poor generalization. To address these limitations, we introduce MoviCam, the first real-world dynamic-camera dataset featuring authentic camera trajectories, scene geometry, and human–environment contact annotations. We further propose PhysDynPose, a novel algorithm that jointly integrates kinematic pose estimation, robust SLAM, and a scene-aware physical optimizer to map monocular poses into the world coordinate system while enforcing physically grounded constraints. Evaluated on our new benchmark, PhysDynPose significantly outperforms prior methods, delivering stable, high-accuracy global estimates of both human and camera poses. It is the first approach to achieve robust modeling in non-planar environments and under arbitrary camera motion, empirically demonstrating strong effectiveness and generalization capability in complex real-world scenarios.

Technology Category

Application Category

📝 Abstract
Most monocular and physics-based human pose tracking methods, while achieving state-of-the-art results, suffer from artifacts when the scene does not have a strictly flat ground plane or when the camera is moving. Moreover, these methods are often evaluated on in-the-wild real world videos without ground-truth data or on synthetic datasets, which fail to model the real world light transport, camera motion, and pose-induced appearance and geometry changes. To tackle these two problems, we introduce MoviCam, the first non-synthetic dataset containing ground-truth camera trajectories of a dynamically moving monocular RGB camera, scene geometry, and 3D human motion with human-scene contact labels. Additionally, we propose PhysDynPose, a physics-based method that incorporates scene geometry and physical constraints for more accurate human motion tracking in case of camera motion and non-flat scenes. More precisely, we use a state-of-the-art kinematics estimator to obtain the human pose and a robust SLAM method to capture the dynamic camera trajectory, enabling the recovery of the human pose in the world frame. We then refine the kinematic pose estimate using our scene-aware physics optimizer. From our new benchmark, we found that even state-of-the-art methods struggle with this inherently challenging setting, i.e. a moving camera and non-planar environments, while our method robustly estimates both human and camera poses in world coordinates.
Problem

Research questions and friction points this paper is trying to address.

Address artifacts in human pose tracking with moving cameras
Overcome limitations of synthetic datasets for real-world scenarios
Improve accuracy in non-flat scenes using physics-based constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MoviCam dataset for real-world camera and motion data
Integrates SLAM for dynamic camera trajectory tracking
Applies physics optimizer for scene-aware pose refinement
🔎 Similar Papers
No similar papers found.