🤖 AI Summary
To address the insufficient robustness of visual SLAM (vSLAM) in dynamic and complex environments, this paper presents the first systematic survey framework for semantic vSLAM, structured around three core themes: semantic feature extraction and association, semantics-driven pose estimation and mapping, and enhanced robustness in dynamic scenes. We unify and comparatively evaluate over 50 representative works and 12 mainstream semantic SLAM datasets—including ScanNet and TUM RGB-D—by integrating advances from computer vision, deep learning (e.g., Mask R-CNN, YOLO), geometric SLAM (e.g., ORB-SLAM variants), and multimodal sensing. Results demonstrate that semantic augmentation consistently improves localization accuracy, resilience to dynamic objects, and high-level scene understanding. We propose a principled taxonomy of semantic vSLAM methodologies and outline a forward-looking research roadmap, thereby filling a critical gap in the literature for comprehensive, up-to-date surveys on semantic vSLAM.
📝 Abstract
Visual Simultaneous Localization and Mapping (vSLAM) has achieved great progress in the computer vision and robotics communities, and has been successfully used in many fields such as autonomous robot navigation and AR/VR. However, vSLAM cannot achieve good localization in dynamic and complex environments. Numerous publications have reported that, by combining with the semantic information with vSLAM, the semantic vSLAM systems have the capability of solving the above problems in recent years. Nevertheless, there is no comprehensive survey about semantic vSLAM. To fill the gap, this paper first reviews the development of semantic vSLAM, explicitly focusing on its strengths and differences. Secondly, we explore three main issues of semantic vSLAM: the extraction and association of semantic information, the application of semantic information, and the advantages of semantic vSLAM. Then, we collect and analyze the current state-of-the-art SLAM datasets which have been widely used in semantic vSLAM systems. Finally, we discuss future directions that will provide a blueprint for the future development of semantic vSLAM.