🤖 AI Summary
This work addresses the insufficient geospatial alignment accuracy between aerial imagery and vehicle-mounted sensor data in learning-based autonomous driving localization systems. We first systematically demonstrate that global geo-alignment quality critically impacts overall localization performance. To this end, we propose two factor-graph-optimization-based cross-view geometric alignment methods that tightly integrate multimodal feature matching with deep-learning-based localization models. Rigorous ablation studies quantify the contribution of each component. Evaluated on a 1600-km real-world driving dataset, our approach achieves a mean positioning error of 0.28 m and heading error of 0.47°, significantly outperforming prior methods and meeting L4-level autonomous driving requirements. The core innovation lies in the deep coupling of end-to-end learning with geometry-prior-driven global alignment, establishing a novel paradigm for trustworthy, multi-source, heterogeneous sensor data alignment.
📝 Abstract
Recently there has been growing interest in the use of aerial and satellite map data for autonomous vehicles, primarily due to its potential for significant cost reduction and enhanced scalability. Despite the advantages, aerial data also comes with challenges such as a sensor-modality gap and a viewpoint difference gap. Learned localization methods have shown promise for overcoming these challenges to provide precise metric localization for autonomous vehicles. Most learned localization methods rely on coarsely aligned ground truth, or implicit consistency-based methods to learn the localization task -- however, in this paper we find that improving the alignment between aerial data and autonomous vehicle sensor data at training time is critical to the performance of a learning-based localization system. We compare two data alignment methods using a factor graph framework and, using these methods, we then evaluate the effects of closely aligned ground truth on learned localization accuracy through ablation studies. Finally, we evaluate a learned localization system using the data alignment methods on a comprehensive (1600km) autonomous vehicle dataset and demonstrate localization error below 0.3m and 0.5$^{circ}$ sufficient for autonomous vehicle applications.