🤖 AI Summary
To address poor localization robustness caused by inaccurate initial pose estimation in GNSS-denied environments, this paper proposes an end-to-end multimodal absolute pose regression method. We introduce the first Transformer-based architecture jointly regressing 3D position and 3D orientation (represented as quaternions) from synchronized image and LiDAR inputs. A lightweight multimodal feature encoder and a jointly optimized loss function are designed for efficient deployment on embedded automotive platforms. Our method achieves state-of-the-art performance on three challenging benchmarks: the custom high-difficulty APR-BeIntelli dataset, Radar Oxford RobotCar, and DeepLoc—yielding mean localization errors of <0.8 m and <1.2°. Real-vehicle experiments confirm real-time inference capability and strong generalization across diverse urban and suburban scenarios. The source code is publicly available.
📝 Abstract
Precise initialization plays a critical role in the performance of localization algorithms, especially in the context of robotics, autonomous driving, and computer vision. Poor localization accuracy is often a consequence of inaccurate initial poses, particularly noticeable in GNSS-denied environments where GPS signals are primarily relied upon for initialization. Recent advances in leveraging deep neural networks for pose regression have led to significant improvements in both accuracy and robustness, especially in estimating complex spatial relationships and orientations. In this paper, we introduce APR-Transformer, a model architecture inspired by state-of-the-art methods, which predicts absolute pose (3D position and 3D orientation) using either image or LiDAR data. We demonstrate that our proposed method achieves state-of-the-art performance on established benchmark datasets such as the Radar Oxford Robot-Car and DeepLoc datasets. Furthermore, we extend our experiments to include our custom complex APR-BeIntelli dataset. Additionally, we validate the reliability of our approach in GNSS-denied environments by deploying the model in real-time on an autonomous test vehicle. This showcases the practical feasibility and effectiveness of our approach. The source code is available at:https://github.com/GT-ARC/APR-Transformer.