🤖 AI Summary
To address self-occlusion and limited field-of-view caused by rear information absence in front-facing monocular head-mounted device (HMD) setups, this paper presents the first systematic validation of the value of rear-mounted cameras for egocentric full-body 3D pose estimation. We propose a Transformer-based multi-view heatmap uncertainty fusion framework that abandons the conventional “2D detection + 3D lifting” pipeline, instead enabling joint modeling and uncertainty-aware feature fusion of front and rear heatmaps. Extensive experiments on our large-scale synthetic and real-world datasets—Ego4View-Syn and Ego4View-RW—demonstrate that our method achieves over 10% improvement in MPJPE over state-of-the-art methods, with particularly notable gains for challenging poses involving upward head movement. All code, models, and datasets are publicly released.
📝 Abstract
Egocentric 3D human pose estimation has been actively studied using cameras installed in front of a head-mounted device (HMD). While frontal placement is the optimal and the only option for some tasks, such as hand tracking, it remains unclear if the same holds for full-body tracking due to self-occlusion and limited field-of-view coverage. Notably, even the state-of-the-art methods often fail to estimate accurate 3D poses in many scenarios, such as when HMD users tilt their heads upward (a common motion in human activities). A key limitation of existing HMD designs is their neglect of the back of the body, despite its potential to provide crucial 3D reconstruction cues. Hence, this paper investigates the usefulness of rear cameras in the HMD design for full-body tracking. We also show that simply adding rear views to the frontal inputs is not optimal for existing methods due to their dependence on individual 2D joint detectors without effective multi-view integration. To address this issue, we propose a new transformer-based method that refines 2D joint heatmap estimation with multi-view information and heatmap uncertainty, thereby improving 3D pose tracking. Moreover, we introduce two new large-scale datasets, Ego4View-Syn and Ego4View-RW, for a rear-view evaluation. Our experiments show that the new camera configurations with back views provide superior support for 3D pose tracking compared to only frontal placements. The proposed method achieves significant improvement over the current state of the art (>10% on MPJPE). We will release the source code, trained models, and new datasets on our project page https://4dqv.mpi-inf.mpg.de/EgoRear/.