RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

📅 2025-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key bottlenecks in novel-view synthesis of human subjects from sparse multi-view images—namely, poor geometric reconstruction robustness and the need for per-subject optimization. We propose a generalizable, subject-agnostic framework that eliminates explicit per-instance optimization. Our core innovations include: (i) the first mapping of SMPL vertices to image-aligned dense 3D prior points, coupled with a voxel- and pixel-feature-driven prior point calibration mechanism; and (ii) a coarse-to-fine Gaussian parameter regression strategy that integrates multi-scale Gaussian splatting—where voxel-based guidance enables coarse geometry reconstruction and depth-map-assisted refinement achieves fine-grained correction—alongside cross-view feature aggregation. The entire pipeline supports end-to-end differentiable rendering. Our method achieves state-of-the-art performance across multiple benchmarks, generating high-fidelity novel views from only 3–5 sparse input views, while demonstrating strong cross-dataset generalization.

Technology Category

Application Category

📝 Abstract
This paper presents RoGSplat, a novel approach for synthesizing high-fidelity novel views of unseen human from sparse multi-view images, while requiring no cumbersome per-subject optimization. Unlike previous methods that typically struggle with sparse views with few overlappings and are less effective in reconstructing complex human geometry, the proposed method enables robust reconstruction in such challenging conditions. Our key idea is to lift SMPL vertices to dense and reliable 3D prior points representing accurate human body geometry, and then regress human Gaussian parameters based on the points. To account for possible misalignment between SMPL model and images, we propose to predict image-aligned 3D prior points by leveraging both pixel-level features and voxel-level features, from which we regress the coarse Gaussians. To enhance the ability to capture high-frequency details, we further render depth maps from the coarse 3D Gaussians to help regress fine-grained pixel-wise Gaussians. Experiments on several benchmark datasets demonstrate that our method outperforms state-of-the-art methods in novel view synthesis and cross-dataset generalization. Our code is available at https://github.com/iSEE-Laboratory/RoGSplat.
Problem

Research questions and friction points this paper is trying to address.

Synthesizes high-fidelity views from sparse multi-view images.
Reconstructs complex human geometry with minimal overlapping views.
Enhances detail capture using depth maps from coarse Gaussians.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lifts SMPL vertices to 3D prior points
Predicts image-aligned 3D prior points
Renders depth maps for fine-grained Gaussians
🔎 Similar Papers
No similar papers found.
J
Junjin Xiao
School of Computer Science and Engineering, Sun Yat-sen University, China; Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China
Q
Qing Zhang
School of Computer Science and Engineering, Sun Yat-sen University, China; Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China
Y
Yonewei Nie
South China University of Technology
L
Lei Zhu
Hong Kong University of Science and Technology (Guangzhou)
Weihua Zheng
Weihua Zheng
A*STAR
Multilingual LLMCultural LLM