Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors

📅 2024-11-24

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing 360° scene reconstruction methods struggle with sparse, uncalibrated 2D images lacking camera poses. To address this, we propose the first end-to-end reconstruction framework requiring no pose priors. Our method introduces a depth-augmented diffusion prior to jointly guide novel view synthesis and depth estimation; employs a FiLM-based modulation mechanism to unify geometric and contextual feature representation; designs a Gaussian point cloud confidence metric to detect artifacts; and establishes a Gaussian-SLAM–style progressive multi-view fusion pipeline. Leveraging 3D Gaussian splatting and confidence-weighted fusion, our approach significantly outperforms prior pose-free methods on MipNeRF360 and DL3DV-10K, achieving reconstruction completeness and multi-view consistency on par with state-of-the-art pose-aware approaches.

Technology Category

Application Category

📝 Abstract

In this work, we introduce a generative approach for pose-free (without camera parameters) reconstruction of 360 scenes from a sparse set of 2D images. Pose-free scene reconstruction from incomplete, pose-free observations is usually regularized with depth estimation or 3D foundational priors. While recent advances have enabled sparse-view reconstruction of large complex scenes (with high degree of foreground and background detail) with known camera poses using view-conditioned generative priors, these methods cannot be directly adapted for the pose-free setting when ground-truth poses are not available during evaluation. To address this, we propose an image-to-image generative model designed to inpaint missing details and remove artifacts in novel view renders and depth maps of a 3D scene. We introduce context and geometry conditioning using Feature-wise Linear Modulation (FiLM) modulation layers as a lightweight alternative to cross-attention and also propose a novel confidence measure for 3D Gaussian splat representations to allow for better detection of these artifacts. By progressively integrating these novel views in a Gaussian-SLAM-inspired process, we achieve a multi-view-consistent 3D representation. Evaluations on the MipNeRF360 and DL3DV-10K benchmark datasets demonstrate that our method surpasses existing pose-free techniques and performs competitively with state-of-the-art posed (precomputed camera parameters are given) reconstruction methods in complex 360 scenes.

Problem

Research questions and friction points this paper is trying to address.

Pose-free 360 scene reconstruction from sparse 2D images

Inpainting missing details in novel view renders and depth maps

Achieving multi-view-consistent 3D representation without ground-truth poses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative model for pose-free scene reconstruction

Depth-enhanced diffusion priors with FiLM modulation

Confidence measure for 3D Gaussian splatting

🔎 Similar Papers

No similar papers found.

Authors to Follow