GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving

📅 2024-10-01
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address robust place recognition for autonomous driving under GPS-denied conditions, this paper proposes a multimodal scene representation method based on 3D Gaussian splatting. It is the first to jointly model multi-view RGB images and LiDAR point clouds as a spatiotemporally consistent, differentiable, explicit 3D Gaussian scene. By performing geometric alignment and interpretable fusion of cross-modal data directly in physical space—bypassing opaque feature-level fusion—the approach significantly enhances transparency and generalizability of cross-modal correspondence. Integrated with 3D graph convolution and Transformer architectures, it enables end-to-end differentiable rendering and place matching. Evaluated on three benchmark datasets, the method achieves state-of-the-art accuracy and demonstrates strong cross-scene generalization. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Place recognition is a crucial component that enables autonomous vehicles to obtain localization results in GPS-denied environments. In recent years, multimodal place recognition methods have gained increasing attention. They overcome the weaknesses of unimodal sensor systems by leveraging complementary information from different modalities. However, most existing methods explore cross-modality correlations through feature-level or descriptor-level fusion, suffering from a lack of interpretability. Conversely, the recently proposed 3D Gaussian Splatting provides a new perspective on multimodal fusion by harmonizing different modalities into an explicit scene representation. In this paper, we propose a 3D Gaussian Splatting-based multimodal place recognition network dubbed GSPR. It explicitly combines multi-view RGB images and LiDAR point clouds into a spatio-temporally unified scene representation with the proposed Multimodal Gaussian Splatting. A network composed of 3D graph convolution and transformer is designed to extract spatio-temporal features and global descriptors from the Gaussian scenes for place recognition. Extensive evaluations on three datasets demonstrate that our method can effectively leverage complementary strengths of both multi-view cameras and LiDAR, achieving SOTA place recognition performance while maintaining solid generalization ability. Our open-source code will be released at https://github.com/QiZS-BIT/GSPR.
Problem

Research questions and friction points this paper is trying to address.

Enhances autonomous vehicle localization in GPS-denied environments.
Integrates multi-view RGB images and LiDAR for scene representation.
Improves interpretability and performance in multimodal place recognition.
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting for multimodal fusion
Combines RGB images and LiDAR point clouds
Uses 3D graph convolution and transformer networks
🔎 Similar Papers
No similar papers found.
Zhangshuo Qi
Zhangshuo Qi
Beijing Institute of Technology
RoboticsIntelligent VehiclesPlace Recognition
J
Junyi Ma
Shanghai Jiao Tong University, Shanghai, 200240, China
J
Jingyi Xu
Shanghai Jiao Tong University, Shanghai, 200240, China
Z
Zijie Zhou
Beijing Institute of Technology, Beijing, 100081, China
L
Luqi Cheng
Beijing Institute of Technology, Beijing, 100081, China
G
Guangming Xiong
Beijing Institute of Technology, Beijing, 100081, China