SkelSplat: Robust Multi-view 3D Human Pose Estimation with Differentiable Gaussian Rendering

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing unsupervised multi-view 3D human pose estimation methods suffer from poor generalization and sensitivity to occlusion. This paper proposes the first ground-truth-free framework based on 3D Gaussian splatting: it models the human body as a differentiable, joint-level Gaussian point cloud, where each Gaussian is independently optimized via one-hot encoding; it jointly leverages differentiable rendering and multi-view geometric constraints to achieve cross-view pose reconstruction without any 3D annotations. To our knowledge, this is the first work to introduce Gaussian splatting into skeletal pose estimation—naturally supporting arbitrary camera configurations and significantly enhancing cross-dataset generalization. Evaluated on Human3.6M and CMU Panoptic, our method achieves up to 47.8% reduction in cross-domain error compared to prior unsupervised approaches, while maintaining robust accuracy under severe occlusion.

Technology Category

Application Category

📝 Abstract
Accurate 3D human pose estimation is fundamental for applications such as augmented reality and human-robot interaction. State-of-the-art multi-view methods learn to fuse predictions across views by training on large annotated datasets, leading to poor generalization when the test scenario differs. To overcome these limitations, we propose SkelSplat, a novel framework for multi-view 3D human pose estimation based on differentiable Gaussian rendering. Human pose is modeled as a skeleton of 3D Gaussians, one per joint, optimized via differentiable rendering to enable seamless fusion of arbitrary camera views without 3D ground-truth supervision. Since Gaussian Splatting was originally designed for dense scene reconstruction, we propose a novel one-hot encoding scheme that enables independent optimization of human joints. SkelSplat outperforms approaches that do not rely on 3D ground truth in Human3.6M and CMU, while reducing the cross-dataset error up to 47.8% compared to learning-based methods. Experiments on Human3.6M-Occ and Occlusion-Person demonstrate robustness to occlusions, without scenario-specific fine-tuning. Our project page is available here: https://skelsplat.github.io.
Problem

Research questions and friction points this paper is trying to address.

Overcoming poor generalization in multi-view 3D pose estimation
Enabling occlusion-robust 3D human pose reconstruction without 3D supervision
Achieving cross-dataset generalization without scenario-specific fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable Gaussian rendering for pose estimation
One-hot encoding for independent joint optimization
No 3D ground-truth supervision required
🔎 Similar Papers
No similar papers found.