Scene Coordinate Reconstruction Priors

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Insufficient multi-view constraints often degrade scene coordinate regression (SCR) models, adversely affecting downstream 3D vision tasks such as visual relocalization and structure-from-motion (SfM). To address this, we propose a probabilistic training framework that integrates high-order geometric reconstruction priors. Specifically, our method jointly models shallow-depth distributions and leverages a pre-trained 3D point cloud diffusion prior—trained on large-scale indoor scans—to explicitly enforce geometric consistency in predicted scene coordinates. By co-optimizing SCR and point cloud generation during training, the framework significantly improves structural coherence of the learned coordinate space. Evaluated on three indoor benchmarks, our approach achieves more consistent point cloud reconstructions, higher pose estimation success rates, and substantial improvements in novel-view synthesis and camera relocalization performance compared to prior methods.

Technology Category

Application Category

📝 Abstract

Scene coordinate regression (SCR) models have proven to be powerful implicit scene representations for 3D vision, enabling visual relocalization and structure-from-motion. SCR models are trained specifically for one scene. If training images imply insufficient multi-view constraints SCR models degenerate. We present a probabilistic reinterpretation of training SCR models, which allows us to infuse high-level reconstruction priors. We investigate multiple such priors, ranging from simple priors over the distribution of reconstructed depth values to learned priors over plausible scene coordinate configurations. For the latter, we train a 3D point cloud diffusion model on a large corpus of indoor scans. Our priors push predicted 3D scene points towards plausible geometry at each training step to increase their likelihood. On three indoor datasets our priors help learning better scene representations, resulting in more coherent scene point clouds, higher registration rates and better camera poses, with a positive effect on down-stream tasks such as novel view synthesis and camera relocalization.

Problem

Research questions and friction points this paper is trying to address.

Addressing SCR model degeneration with insufficient multi-view constraints

Infusing reconstruction priors for plausible scene geometry

Improving scene representations for downstream vision tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic reinterpretation of scene coordinate regression training

Infusing reconstruction priors over depth distributions

Using 3D point cloud diffusion model for plausible geometry

🔎 Similar Papers

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians