🤖 AI Summary
Existing unsupervised multi-view 3D human pose estimation methods suffer from poor generalization and sensitivity to occlusion. This paper proposes the first ground-truth-free framework based on 3D Gaussian splatting: it models the human body as a differentiable, joint-level Gaussian point cloud, where each Gaussian is independently optimized via one-hot encoding; it jointly leverages differentiable rendering and multi-view geometric constraints to achieve cross-view pose reconstruction without any 3D annotations. To our knowledge, this is the first work to introduce Gaussian splatting into skeletal pose estimation—naturally supporting arbitrary camera configurations and significantly enhancing cross-dataset generalization. Evaluated on Human3.6M and CMU Panoptic, our method achieves up to 47.8% reduction in cross-domain error compared to prior unsupervised approaches, while maintaining robust accuracy under severe occlusion.
📝 Abstract
Accurate 3D human pose estimation is fundamental for applications such as augmented reality and human-robot interaction. State-of-the-art multi-view methods learn to fuse predictions across views by training on large annotated datasets, leading to poor generalization when the test scenario differs. To overcome these limitations, we propose SkelSplat, a novel framework for multi-view 3D human pose estimation based on differentiable Gaussian rendering. Human pose is modeled as a skeleton of 3D Gaussians, one per joint, optimized via differentiable rendering to enable seamless fusion of arbitrary camera views without 3D ground-truth supervision. Since Gaussian Splatting was originally designed for dense scene reconstruction, we propose a novel one-hot encoding scheme that enables independent optimization of human joints. SkelSplat outperforms approaches that do not rely on 3D ground truth in Human3.6M and CMU, while reducing the cross-dataset error up to 47.8% compared to learning-based methods. Experiments on Human3.6M-Occ and Occlusion-Person demonstrate robustness to occlusions, without scenario-specific fine-tuning. Our project page is available here: https://skelsplat.github.io.