Empowering Feed-Forward Reconstruction Models with Metric Scale via Satellite Images

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the inherent scale ambiguity in feed-forward 3D reconstruction models, which hinders their ability to recover metrically accurate geometry and limits applicability in scenarios requiring metric understanding. To resolve this, the authors propose the first end-to-end trainable feed-forward framework that leverages readily available unlabeled satellite imagery as a global metric prior. By integrating satellite image retrieval, bidirectional cross-view feature interaction, and scale consistency constraints, the model recovers absolute scene scale, refines geometric structure, and estimates camera poses in a metric coordinate system—even with only coarse initial pose estimates. Evaluated on KITTI, nuScenes, and Oxford RobotCar, the method significantly improves metric depth estimation, multi-view point cloud reconstruction, and cross-view localization accuracy, while demonstrating strong generalization across datasets and geographic regions.

📝 Abstract

Feed-forward 3D reconstruction models have recently shown strong generalization across diverse scenes, yet most of them recover geometry only up to an unknown global scale. This scale ambiguity limits their use in applications that require metric understanding of the environment. Existing metric reconstruction methods commonly rely on large-scale metric annotations or accurate camera calibration, both of which are costly or unreliable in many real-world settings. We propose a satellite-guided framework for resolving scale ambiguity in feed-forward 3D reconstruction. The key idea is to use readily available satellite imagery as a global metric reference. Given a coarse camera pose, our method retrieves a local satellite patch and integrates it with a feed-forward reconstruction backbone through bidirectional cross-view interaction. By enforcing consistency between the reconstructed scene and the satellite reference, the model infers absolute scale, refines scene geometry, and estimates camera pose in a metric coordinate frame. Experiments on KITTI, nuScenes, and Oxford RobotCar show consistent improvements in metric depth estimation, multi-view point-cloud reconstruction, and cross-view camera localization, while preserving strong generalization across datasets and geographic regions.

Problem

Research questions and friction points this paper is trying to address.

scale ambiguity

metric reconstruction

3D reconstruction

feed-forward models

satellite imagery

Innovation

Methods, ideas, or system contributions that make the work stand out.

metric scale

satellite-guided reconstruction

feed-forward 3D reconstruction