Intrinsic Geometry-Appearance Consistency Optimization for Sparse-View Gaussian Splatting

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenges of single-image high-fidelity 3D human reconstruction—such as structural distortions, over-smoothing, and limited generalization—by proposing the MVD-HuGaS framework. The method leverages a multi-view human diffusion model to generate geometrically consistent multi-view images and jointly optimizes a 3D Gaussian representation with camera poses. It incorporates human structural priors and introduces an alignment module to enable cooperative optimization between Gaussians and poses. Additionally, a depth-guided mechanism is designed to mitigate facial distortions. Evaluated on the Thuman2.0 and 2K2K datasets, the approach achieves state-of-the-art performance in single-view 3D human reconstruction, significantly improving rendering fidelity and generalization to real-world scenes.

Technology Category

Application Category

📝 Abstract

3D human reconstruction from a single image is a challenging problem and has been exclusively studied in the literature. Recently, some methods have resorted to diffusion models for guidance, optimizing a 3D representation via Score Distillation Sampling(SDS) or generating a back-view image for facilitating reconstruction. However, these methods tend to produce unsatisfactory artifacts (\textit{e.g.} flattened human structure or over-smoothing results caused by inconsistent priors from multiple views) and struggle with real-world generalization in the wild. In this work, we present \emph{MVD-HuGaS}, enabling free-view 3D human rendering from a single image via a multi-view human diffusion model. We first generate multi-view images from the single reference image with an enhanced multi-view diffusion model, which is well fine-tuned on high-quality 3D human datasets to incorporate 3D geometry priors and human structure priors. To infer accurate camera poses from the sparse generated multi-view images for reconstruction, an alignment module is introduced to facilitate joint optimization of 3D Gaussians and camera poses. Furthermore, we propose a depth-based Facial Distortion Mitigation module to refine the generated facial regions, thereby improving the overall fidelity of the reconstruction. Finally, leveraging the refined multi-view images, along with their accurate camera poses, MVD-HuGaS optimizes the 3D Gaussians of the target human for high-fidelity free-view renderings. Extensive experiments on Thuman2.0 and 2K2K datasets show that the proposed MVD-HuGaS achieves state-of-the-art performance on single-view 3D human rendering.

Problem

Research questions and friction points this paper is trying to address.

3D human reconstruction

single-view reconstruction

multi-view consistency

real-world generalization

artifact reduction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting

Multi-view Diffusion

Single-view 3D Reconstruction

Camera Pose Optimization

Facial Distortion Mitigation

🔎 Similar Papers

No similar papers found.

Authors to Follow