🤖 AI Summary
To address the challenge of 3D scene reconstruction from sparse views (only 3–5 input images), this paper proposes the first 3D Gaussian Splatting framework integrated with large-model priors. Methodologically: (1) robust geometric initialization is achieved via stereo vision; (2) diffusion models guide iterative optimization of Gaussian parameters, mitigating overfitting and detail loss; (3) video-diffusion priors enhance novel-view synthesis quality. The approach synergistically combines 3D Gaussian Splatting, multi-view stereo matching, Stable Diffusion–style diffusion models, differentiable rendering, and joint optimization. Extensive evaluations on standard benchmarks demonstrate significant improvements over state-of-the-art sparse-view methods—achieving high-fidelity, full 360° reconstructions using merely 3–5 input images, compared to hundreds required previously—while attaining new state-of-the-art visual quality.
📝 Abstract
We aim to address sparse-view reconstruction of a 3D scene by leveraging priors from large-scale vision models. While recent advancements such as 3D Gaussian Splatting (3DGS) have demonstrated remarkable successes in 3D reconstruction, these methods typically necessitate hundreds of input images that densely capture the underlying scene, making them time-consuming and impractical for real-world applications. However, sparse-view reconstruction is inherently ill-posed and under-constrained, often resulting in inferior and incomplete outcomes. This is due to issues such as failed initialization, overfitting on input images, and a lack of details. To mitigate these challenges, we introduce LM-Gaussian, a method capable of generating high-quality reconstructions from a limited number of images. Specifically, we propose a robust initialization module that leverages stereo priors to aid in the recovery of camera poses and the reliable point clouds. Additionally, a diffusion-based refinement is iteratively applied to incorporate image diffusion priors into the Gaussian optimization process to preserve intricate scene details. Finally, we utilize video diffusion priors to further enhance the rendered images for realistic visual effects. Overall, our approach significantly reduces the data acquisition requirements compared to previous 3DGS methods. We validate the effectiveness of our framework through experiments on various public datasets, demonstrating its potential for high-quality 360-degree scene reconstruction. Visual results are on our website.