RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

This work addresses key bottlenecks in novel-view synthesis of human subjects from sparse multi-view images—namely, poor geometric reconstruction robustness and the need for per-subject optimization. We propose a generalizable, subject-agnostic framework that eliminates explicit per-instance optimization. Our core innovations include: (i) the first mapping of SMPL vertices to image-aligned dense 3D prior points, coupled with a voxel- and pixel-feature-driven prior point calibration mechanism; and (ii) a coarse-to-fine Gaussian parameter regression strategy that integrates multi-scale Gaussian splatting—where voxel-based guidance enables coarse geometry reconstruction and depth-map-assisted refinement achieves fine-grained correction—alongside cross-view feature aggregation. The entire pipeline supports end-to-end differentiable rendering. Our method achieves state-of-the-art performance across multiple benchmarks, generating high-fidelity novel views from only 3–5 sparse input views, while demonstrating strong cross-dataset generalization.

Technology Category

Application Category

📝 Abstract

This paper presents RoGSplat, a novel approach for synthesizing high-fidelity novel views of unseen human from sparse multi-view images, while requiring no cumbersome per-subject optimization. Unlike previous methods that typically struggle with sparse views with few overlappings and are less effective in reconstructing complex human geometry, the proposed method enables robust reconstruction in such challenging conditions. Our key idea is to lift SMPL vertices to dense and reliable 3D prior points representing accurate human body geometry, and then regress human Gaussian parameters based on the points. To account for possible misalignment between SMPL model and images, we propose to predict image-aligned 3D prior points by leveraging both pixel-level features and voxel-level features, from which we regress the coarse Gaussians. To enhance the ability to capture high-frequency details, we further render depth maps from the coarse 3D Gaussians to help regress fine-grained pixel-wise Gaussians. Experiments on several benchmark datasets demonstrate that our method outperforms state-of-the-art methods in novel view synthesis and cross-dataset generalization. Our code is available at https://github.com/iSEE-Laboratory/RoGSplat.

Problem

Research questions and friction points this paper is trying to address.

Synthesizes high-fidelity views from sparse multi-view images.

Reconstructs complex human geometry with minimal overlapping views.

Enhances detail capture using depth maps from coarse Gaussians.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lifts SMPL vertices to 3D prior points

Predicts image-aligned 3D prior points

Renders depth maps for fine-grained Gaussians

🔎 Similar Papers

GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers