๐ค AI Summary
This work addresses the challenges of loop closure detection in visual SLAM, where appearance changes and perceptual aliasing degrade accuracy, and deep learningโbased approaches often fail to meet real-time requirements. To overcome these limitations, we propose an efficient and robust loop closure detection framework that integrates NetVLAD for fine-grained visual place description with Faiss-accelerated nearest neighbor search. This combination achieves significantly improved detection accuracy while maintaining real-time performance. We introduce fine-grained Top-K precision-recall curves for comprehensive evaluation and validate our approach on the KITTI dataset, demonstrating that NetVLAD serves as a high-accuracy, real-time alternative to DBoW, offering enhanced robustness and practical deployability.
๐ Abstract
Loop closure detection (LCD) is a core component of simultaneous localization and mapping (SLAM): it identifies revisited places and enables pose-graph constraints that correct accumulated drift. Classic bag-of-words approaches such as DBoW are efficient but often degrade under appearance change and perceptual aliasing. In parallel, deep learning-based visual place recognition (VPR) descriptors (e.g., NetVLAD and Transformer-based models) offer stronger robustness, but their computational cost is often viewed as a barrier to real-time SLAM. In this paper, we empirically evaluate NetVLAD as an LCD module and compare it against DBoW on the KITTI dataset. We introduce a Fine-Grained Top-K precision-recall curve that better reflects LCD settings where a query may have zero or multiple valid matches. With Faiss-accelerated nearestneighbor search, NetVLAD achieves real-time query speed while improving accuracy and robustness over DBoW, making it a practical drop-in alternative for LCD in SLAM.