🤖 AI Summary
To address high-precision real-time localization in GPS-denied off-road environments, this paper proposes a particle filter framework based on cross-modal feature matching. It first learns bird’s-eye-view (BEV) features from onboard RGB-D data, then matches these features against local descriptors extracted from retrieved aerial imagery to compute observation likelihoods for each particle pose hypothesis. Crucially, this work is the first to integrate learnable BEV representations with aerial image local matching into a particle filter, significantly improving robustness under challenging conditions—such as dense canopy cover and strong shadows—and enhancing generalization to both seen and unseen trajectories. Evaluated on two real-world off-road datasets, the method reduces absolute trajectory error (ATE) by 7.5× (seen) and 7.0× (unseen) over retrieval-based baselines, while achieving real-time inference at 10 Hz on a Tesla T4 GPU.
📝 Abstract
We propose BEV-Patch-PF, a GPS-free sequential geo-localization system that integrates a particle filter with learned bird's-eye-view (BEV) and aerial feature maps. From onboard RGB and depth images, we construct a BEV feature map. For each 3-DoF particle pose hypothesis, we crop the corresponding patch from an aerial feature map computed from a local aerial image queried around the approximate location. BEV-Patch-PF computes a per-particle log-likelihood by matching the BEV feature to the aerial patch feature. On two real-world off-road datasets, our method achieves 7.5x lower absolute trajectory error (ATE) on seen routes and 7.0x lower ATE on unseen routes than a retrieval-based baseline, while maintaining accuracy under dense canopy and shadow. The system runs in real time at 10 Hz on an NVIDIA Tesla T4, enabling practical robot deployment.