🤖 AI Summary
Insufficient multi-view constraints often degrade scene coordinate regression (SCR) models, adversely affecting downstream 3D vision tasks such as visual relocalization and structure-from-motion (SfM). To address this, we propose a probabilistic training framework that integrates high-order geometric reconstruction priors. Specifically, our method jointly models shallow-depth distributions and leverages a pre-trained 3D point cloud diffusion prior—trained on large-scale indoor scans—to explicitly enforce geometric consistency in predicted scene coordinates. By co-optimizing SCR and point cloud generation during training, the framework significantly improves structural coherence of the learned coordinate space. Evaluated on three indoor benchmarks, our approach achieves more consistent point cloud reconstructions, higher pose estimation success rates, and substantial improvements in novel-view synthesis and camera relocalization performance compared to prior methods.
📝 Abstract
Scene coordinate regression (SCR) models have proven to be powerful implicit scene representations for 3D vision, enabling visual relocalization and structure-from-motion. SCR models are trained specifically for one scene. If training images imply insufficient multi-view constraints SCR models degenerate. We present a probabilistic reinterpretation of training SCR models, which allows us to infuse high-level reconstruction priors. We investigate multiple such priors, ranging from simple priors over the distribution of reconstructed depth values to learned priors over plausible scene coordinate configurations. For the latter, we train a 3D point cloud diffusion model on a large corpus of indoor scans. Our priors push predicted 3D scene points towards plausible geometry at each training step to increase their likelihood. On three indoor datasets our priors help learning better scene representations, resulting in more coherent scene point clouds, higher registration rates and better camera poses, with a positive effect on down-stream tasks such as novel view synthesis and camera relocalization.