🤖 AI Summary
This work addresses the substantial domain gap between camera observations and floorplans in visual localization by proposing the first zero-shot floorplan-based localization method that requires no environment-specific training data. The approach generates a bird’s-eye view via monocular 3D reconstruction, extracts dominant geometric primitives—such as lines and circles—from this view, and matches them to the floorplan using a specialized minimal solver within a robust estimation framework. By leveraging ubiquitous geometric structures inherent to human-made environments rather than appearance-based features, the method achieves cross-scene generalization. Experiments demonstrate that it outperforms state-of-the-art learning-based methods on both simulated and real-world unseen scenes, using a single set of hyperparameters throughout and significantly improving localization accuracy.
📝 Abstract
Visual localization -- estimating a camera pose within a pre-existing map -- is a fundamental problem in computer vision.
Floorplans are an attractive map representation: they are readily available for most buildings, compact, and inherently invariant to visual appearance changes.
However, bridging the severe domain gap between camera observations and floorplan geometry remains challenging.
Existing methods address this gap through data-driven learning, yet they require large-scale training data and environment-specific retraining, limiting their practical deployment.
We propose a zero-shot floorplan localization method that generalizes to novel environments without any retraining.
Our key insight is that dominant geometric primitives -- lines and circles -- are ubiquitous in human-made environments and provide appearance-invariant structural constraints.
We extract these primitives from a bird's-eye-view (BEV) projection of monocular 3D reconstructions and match them to the floorplan via dedicated minimal solvers within a robust estimation framework.
Experiments on both simulated and real-world datasets show that our approach outperforms state-of-the-art learning-based methods on unseen environments, while using a single fixed set of hyperparameters across all experiments.
The source code will be made publicly available.