🤖 AI Summary
To address the global localization challenge for autonomous vehicles under GNSS-denied conditions (e.g., urban canyons, tunnels), this paper proposes a cross-modal localization method leveraging OpenStreetMap (OSM) vector data and onboard LiDAR point clouds. The method treats road intersections as structured landmarks and introduces a lightweight binary descriptor. Key contributions include: (i) a discrepancy-mitigation mechanism to bridge abstraction-level and geometric-precision gaps between OSM and point clouds; (ii) orientation-adaptive intersection detection; and (iii) area-balanced sampling for robust feature alignment. The pipeline comprises intersection detection, building contour encoding, cross-modal feature matching, and compact descriptor generation. Evaluated on the KITTI dataset, the method achieves state-of-the-art localization accuracy—significantly outperforming existing approaches. It supports multi-source structured point clouds, demonstrating strong generalization and scalability across diverse urban environments.
📝 Abstract
Reliable global localization is critical for autonomous vehicles, especially in environments where GNSS is degraded or unavailable, such as urban canyons and tunnels. Although high-definition (HD) maps provide accurate priors, the cost of data collection, map construction, and maintenance limits scalability. OpenStreetMap (OSM) offers a free and globally available alternative, but its coarse abstraction poses challenges for matching with sensor data. We propose InterKey, a cross-modal framework that leverages road intersections as distinctive landmarks for global localization. Our method constructs compact binary descriptors by jointly encoding road and building imprints from point clouds and OSM. To bridge modality gaps, we introduce discrepancy mitigation, orientation determination, and area-equalized sampling strategies, enabling robust cross-modal matching. Experiments on the KITTI dataset demonstrate that InterKey achieves state-of-the-art accuracy, outperforming recent baselines by a large margin. The framework generalizes to sensors that can produce dense structural point clouds, offering a scalable and cost-effective solution for robust vehicle localization.