🤖 AI Summary
Visual loop closure detection suffers from high false-positive rates and computationally expensive RANSAC-based geometric verification, limiting both accuracy and real-time performance of online SLAM systems. To address these challenges, we propose LoopGNN—the first approach to integrate Graph Neural Networks (GNNs) into loop closure detection. LoopGNN constructs a multi-frame neighborhood graph over keyframes, models local consistency via visually similar keyframe cliques, and achieves robust feature consensus estimation through deep feature propagation. Additionally, a multi-scale keypoint extractor is introduced to enhance generalization. Evaluated on TartanDrive 2.0 and NCLT datasets, LoopGNN significantly outperforms conventional and state-of-the-art deep learning methods, achieving high recall while drastically reducing verification latency—surpassing RANSAC in computational efficiency. The source code and datasets are publicly available.
📝 Abstract
Visual loop closure detection traditionally relies on place recognition methods to retrieve candidate loops that are validated using computationally expensive RANSAC-based geometric verification. As false positive loop closures significantly degrade downstream pose graph estimates, verifying a large number of candidates in online simultaneous localization and mapping scenarios is constrained by limited time and compute resources. While most deep loop closure detection approaches only operate on pairs of keyframes, we relax this constraint by considering neighborhoods of multiple keyframes when detecting loops. In this work, we introduce LoopGNN, a graph neural network architecture that estimates loop closure consensus by leveraging cliques of visually similar keyframes retrieved through place recognition. By propagating deep feature encodings among nodes of the clique, our method yields high-precision estimates while maintaining high recall. Extensive experimental evaluations on the TartanDrive 2.0 and NCLT datasets demonstrate that LoopGNN outperforms traditional baselines. Additionally, an ablation study across various keypoint extractors demonstrates that our method is robust, regardless of the type of deep feature encodings used, and exhibits higher computational efficiency compared to classical geometric verification baselines. We release our code, supplementary material, and keyframe data at https://loopgnn.cs.uni-freiburg.de.