LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors

📅 2024-09-05
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of 3D scene reconstruction from sparse views (only 3–5 input images), this paper proposes the first 3D Gaussian Splatting framework integrated with large-model priors. Methodologically: (1) robust geometric initialization is achieved via stereo vision; (2) diffusion models guide iterative optimization of Gaussian parameters, mitigating overfitting and detail loss; (3) video-diffusion priors enhance novel-view synthesis quality. The approach synergistically combines 3D Gaussian Splatting, multi-view stereo matching, Stable Diffusion–style diffusion models, differentiable rendering, and joint optimization. Extensive evaluations on standard benchmarks demonstrate significant improvements over state-of-the-art sparse-view methods—achieving high-fidelity, full 360° reconstructions using merely 3–5 input images, compared to hundreds required previously—while attaining new state-of-the-art visual quality.

Technology Category

Application Category

📝 Abstract
We aim to address sparse-view reconstruction of a 3D scene by leveraging priors from large-scale vision models. While recent advancements such as 3D Gaussian Splatting (3DGS) have demonstrated remarkable successes in 3D reconstruction, these methods typically necessitate hundreds of input images that densely capture the underlying scene, making them time-consuming and impractical for real-world applications. However, sparse-view reconstruction is inherently ill-posed and under-constrained, often resulting in inferior and incomplete outcomes. This is due to issues such as failed initialization, overfitting on input images, and a lack of details. To mitigate these challenges, we introduce LM-Gaussian, a method capable of generating high-quality reconstructions from a limited number of images. Specifically, we propose a robust initialization module that leverages stereo priors to aid in the recovery of camera poses and the reliable point clouds. Additionally, a diffusion-based refinement is iteratively applied to incorporate image diffusion priors into the Gaussian optimization process to preserve intricate scene details. Finally, we utilize video diffusion priors to further enhance the rendered images for realistic visual effects. Overall, our approach significantly reduces the data acquisition requirements compared to previous 3DGS methods. We validate the effectiveness of our framework through experiments on various public datasets, demonstrating its potential for high-quality 360-degree scene reconstruction. Visual results are on our website.
Problem

Research questions and friction points this paper is trying to address.

Sparse-view 3D scene reconstruction using large model priors
Overcoming ill-posed sparse-view reconstruction limitations
Reducing input image requirements for 3D Gaussian Splatting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages large-scale vision model priors
Uses robust initialization with stereo priors
Applies diffusion-based refinement iteratively
🔎 Similar Papers
No similar papers found.
H
Hanyang Yu
The Hong Kong University of Science and Technology
X
Xiaoxiao Long
The Hong Kong University of Science and Technology
Ping Tan
Ping Tan
Hong Kong University of Science and Technology (HKUST)
Computer VisionComputer Graphics