🤖 AI Summary
High-precision visual localization remains a critical challenge for high-level autonomous driving; conventional map-matching approaches are sensitive to perception noise and rely heavily on manual parameter tuning. This paper proposes an end-to-end neural localization framework that directly regresses the vehicle’s six-degree-of-freedom (6-DoF) pose from surround-view images, eliminating explicit perception–HD-map matching. Its core innovation is a decoupled Bird’s-Eye-View (BEV) neural matching mechanism: it separately models the influence of each pose degree of freedom on the feature space, drastically reducing the dimensionality of differentiable sampling while preserving interpretability, efficiency, and robustness. Experiments on public benchmarks achieve decimeter-level accuracy—0.19 m longitudinal, 0.13 m lateral, and 0.39° heading error—with 68.8% lower inference memory consumption, enabling lightweight, vision-only deployment.
📝 Abstract
Accurate localization plays an important role in high-level autonomous driving systems. Conventional map matching-based localization methods solve the poses by explicitly matching map elements with sensor observations, generally sensitive to perception noise, therefore requiring costly hyper-parameter tuning. In this paper, we propose an end-to-end localization neural network which directly estimates vehicle poses from surrounding images, without explicitly matching perception results with HD maps. To ensure efficiency and interpretability, a decoupled BEV neural matching-based pose solver is proposed, which estimates poses in a differentiable sampling-based matching module. Moreover, the sampling space is hugely reduced by decoupling the feature representation affected by each DoF of poses. The experimental results demonstrate that the proposed network is capable of performing decimeter level localization with mean absolute errors of 0.19m, 0.13m and 0.39 degree in longitudinal, lateral position and yaw angle while exhibiting a 68.8% reduction in inference memory usage.