Semantic Visual Simultaneous Localization and Mapping: A Survey

📅 2022-09-14
🏛️ arXiv.org
📈 Citations: 11
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient robustness of visual SLAM (vSLAM) in dynamic and complex environments, this paper presents the first systematic survey framework for semantic vSLAM, structured around three core themes: semantic feature extraction and association, semantics-driven pose estimation and mapping, and enhanced robustness in dynamic scenes. We unify and comparatively evaluate over 50 representative works and 12 mainstream semantic SLAM datasets—including ScanNet and TUM RGB-D—by integrating advances from computer vision, deep learning (e.g., Mask R-CNN, YOLO), geometric SLAM (e.g., ORB-SLAM variants), and multimodal sensing. Results demonstrate that semantic augmentation consistently improves localization accuracy, resilience to dynamic objects, and high-level scene understanding. We propose a principled taxonomy of semantic vSLAM methodologies and outline a forward-looking research roadmap, thereby filling a critical gap in the literature for comprehensive, up-to-date surveys on semantic vSLAM.
📝 Abstract
Visual Simultaneous Localization and Mapping (vSLAM) has achieved great progress in the computer vision and robotics communities, and has been successfully used in many fields such as autonomous robot navigation and AR/VR. However, vSLAM cannot achieve good localization in dynamic and complex environments. Numerous publications have reported that, by combining with the semantic information with vSLAM, the semantic vSLAM systems have the capability of solving the above problems in recent years. Nevertheless, there is no comprehensive survey about semantic vSLAM. To fill the gap, this paper first reviews the development of semantic vSLAM, explicitly focusing on its strengths and differences. Secondly, we explore three main issues of semantic vSLAM: the extraction and association of semantic information, the application of semantic information, and the advantages of semantic vSLAM. Then, we collect and analyze the current state-of-the-art SLAM datasets which have been widely used in semantic vSLAM systems. Finally, we discuss future directions that will provide a blueprint for the future development of semantic vSLAM.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of vSLAM in dynamic environments.
Surveys semantic vSLAM advancements and applications.
Identifies future directions for semantic vSLAM development.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines semantic information with vSLAM
Explores semantic extraction and application
Analyzes state-of-the-art SLAM datasets
🔎 Similar Papers
No similar papers found.
K
Kaiqi Chen
Institute of Computer Vision, College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China; Human-Robot Interfaces and Interaction Lab, Istituto Italiano di Tecnologia, Genoa, Italy
Jianhua Zhang
Jianhua Zhang
Beijing University of Posts and Telecommunications, CHINA
Signal ProcessingWireless CommunicationRadio channel Measurement and ModellingChannel SimulationTerminal Testing
Jialing Liu
Jialing Liu
Institute of Computer Vision, College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China; Human-Robot Interfaces and Interaction Lab, Istituto Italiano di Tecnologia, Genoa, Italy
Q
Qiyi Tong
Human-Robot Interfaces and Interaction Lab, Istituto Italiano di Tecnologia, Genoa, Italy; Universit`a di Genova, Genoa, Italy
Ruyu Liu
Ruyu Liu
Marie Skłodowska-Curie Fellow in DTU
Urban spatial perception and BIPV optimizationendoscopic 3D perception
S
Shengyong Chen
Institute of Computer Vision, School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China