Intrinsic Geometry-Appearance Consistency Optimization for Sparse-View Gaussian Splatting

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of single-image high-fidelity 3D human reconstruction—such as structural distortions, over-smoothing, and limited generalization—by proposing the MVD-HuGaS framework. The method leverages a multi-view human diffusion model to generate geometrically consistent multi-view images and jointly optimizes a 3D Gaussian representation with camera poses. It incorporates human structural priors and introduces an alignment module to enable cooperative optimization between Gaussians and poses. Additionally, a depth-guided mechanism is designed to mitigate facial distortions. Evaluated on the Thuman2.0 and 2K2K datasets, the approach achieves state-of-the-art performance in single-view 3D human reconstruction, significantly improving rendering fidelity and generalization to real-world scenes.

Technology Category

Application Category

📝 Abstract
3D human reconstruction from a single image is a challenging problem and has been exclusively studied in the literature. Recently, some methods have resorted to diffusion models for guidance, optimizing a 3D representation via Score Distillation Sampling(SDS) or generating a back-view image for facilitating reconstruction. However, these methods tend to produce unsatisfactory artifacts (\textit{e.g.} flattened human structure or over-smoothing results caused by inconsistent priors from multiple views) and struggle with real-world generalization in the wild. In this work, we present \emph{MVD-HuGaS}, enabling free-view 3D human rendering from a single image via a multi-view human diffusion model. We first generate multi-view images from the single reference image with an enhanced multi-view diffusion model, which is well fine-tuned on high-quality 3D human datasets to incorporate 3D geometry priors and human structure priors. To infer accurate camera poses from the sparse generated multi-view images for reconstruction, an alignment module is introduced to facilitate joint optimization of 3D Gaussians and camera poses. Furthermore, we propose a depth-based Facial Distortion Mitigation module to refine the generated facial regions, thereby improving the overall fidelity of the reconstruction. Finally, leveraging the refined multi-view images, along with their accurate camera poses, MVD-HuGaS optimizes the 3D Gaussians of the target human for high-fidelity free-view renderings. Extensive experiments on Thuman2.0 and 2K2K datasets show that the proposed MVD-HuGaS achieves state-of-the-art performance on single-view 3D human rendering.
Problem

Research questions and friction points this paper is trying to address.

3D human reconstruction
single-view reconstruction
multi-view consistency
real-world generalization
artifact reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting
Multi-view Diffusion
Single-view 3D Reconstruction
Camera Pose Optimization
Facial Distortion Mitigation
🔎 Similar Papers
No similar papers found.
K
Kaiqiang Xiong
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University; Peng Cheng Laboratory
R
Rui Peng
Alibaba Group
Jiahao Wu
Jiahao Wu
The Chinese University of Hong Kong
Medical RobotsRobot-assisted MicrosurgeryMotion Planning
Z
Zhanke Wang
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University
J
Jie Liang
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University; Peng Cheng Laboratory
X
Xiaoyun Zheng
Peng Cheng Laboratory
Feng Gao
Feng Gao
Professor, Department of Physics, School of Science, Tianjin University
BioinformaticsMicrobial GenomicsComputational biology
Ronggang Wang
Ronggang Wang
Shenzhen Graduate School, Peking University
Immersive Video Coding and Processing