APR-Transformer: Initial Pose Estimation for Localization in Complex Environments through Absolute Pose Regression

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor localization robustness caused by inaccurate initial pose estimation in GNSS-denied environments, this paper proposes an end-to-end multimodal absolute pose regression method. We introduce the first Transformer-based architecture jointly regressing 3D position and 3D orientation (represented as quaternions) from synchronized image and LiDAR inputs. A lightweight multimodal feature encoder and a jointly optimized loss function are designed for efficient deployment on embedded automotive platforms. Our method achieves state-of-the-art performance on three challenging benchmarks: the custom high-difficulty APR-BeIntelli dataset, Radar Oxford RobotCar, and DeepLoc—yielding mean localization errors of <0.8 m and <1.2°. Real-vehicle experiments confirm real-time inference capability and strong generalization across diverse urban and suburban scenarios. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Precise initialization plays a critical role in the performance of localization algorithms, especially in the context of robotics, autonomous driving, and computer vision. Poor localization accuracy is often a consequence of inaccurate initial poses, particularly noticeable in GNSS-denied environments where GPS signals are primarily relied upon for initialization. Recent advances in leveraging deep neural networks for pose regression have led to significant improvements in both accuracy and robustness, especially in estimating complex spatial relationships and orientations. In this paper, we introduce APR-Transformer, a model architecture inspired by state-of-the-art methods, which predicts absolute pose (3D position and 3D orientation) using either image or LiDAR data. We demonstrate that our proposed method achieves state-of-the-art performance on established benchmark datasets such as the Radar Oxford Robot-Car and DeepLoc datasets. Furthermore, we extend our experiments to include our custom complex APR-BeIntelli dataset. Additionally, we validate the reliability of our approach in GNSS-denied environments by deploying the model in real-time on an autonomous test vehicle. This showcases the practical feasibility and effectiveness of our approach. The source code is available at:https://github.com/GT-ARC/APR-Transformer.
Problem

Research questions and friction points this paper is trying to address.

Improves initial pose estimation for localization accuracy
Addresses GNSS-denied environments using image/LiDAR data
Enhances robustness in complex spatial relationship estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

APR-Transformer predicts absolute pose using image/LiDAR data
Achieves state-of-the-art performance on benchmark datasets
Validated in GNSS-denied environments on autonomous vehicles
🔎 Similar Papers
No similar papers found.
S
Srinivas Ravuri
Faculty of Electrical Engineering and Computer Science, Chair of Agent Technology, Technische Universität Berlin, Straße des 17. Juni 135, 10623 Berlin, Germany
Y
Yuan Xu
Faculty of Electrical Engineering and Computer Science, Chair of Agent Technology, Technische Universität Berlin, Straße des 17. Juni 135, 10623 Berlin, Germany
M
Martin Ludwig Zehetner
Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany
K
Ketan Motlag
Faculty of Electrical Engineering and Computer Science, Chair of Agent Technology, Technische Universität Berlin, Straße des 17. Juni 135, 10623 Berlin, Germany
Sahin Albayrak
Sahin Albayrak
Professur für Informatik, Technische Universität Berlin
Distributed AIAgents TechnologiesSemantic WebDistributed System