Semantic Visual Simultaneous Localization and Mapping: A Survey on State of the Art, Challenges, and Future Directions

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The semantic visual SLAM field lacks a systematic survey, particularly regarding the integration of deep learning and large language models (LLMs). To address this gap, we propose a unified problem formulation that decomposes semantic SLAM into five core modules: visual localization, semantic feature extraction, map construction, data association, and loop closure optimization. We introduce a modular analytical framework that unifies classical geometric approaches with modern semantic understanding techniques—including semantic segmentation, object detection, scene understanding, and LLM-based reasoning—and conduct empirical evaluations on benchmark datasets. Our work provides the first comprehensive taxonomy of technical evolution, critically analyzes limitations of existing methods, and identifies key bottlenecks: semantic consistency, cross-modal alignment, and real-time performance. The study establishes an authoritative knowledge base and a scalable technical roadmap for future research in semantic SLAM.

Technology Category

Application Category

📝 Abstract
Semantic Simultaneous Localization and Mapping (SLAM) is a critical area of research within robotics and computer vision, focusing on the simultaneous localization of robotic systems and associating semantic information to construct the most accurate and complete comprehensive model of the surrounding environment. Since the first foundational work in Semantic SLAM appeared more than two decades ago, this field has received increasing attention across various scientific communities. Despite its significance, the field lacks comprehensive surveys encompassing recent advances and persistent challenges. In response, this study provides a thorough examination of the state-of-the-art of Semantic SLAM techniques, with the aim of illuminating current trends and key obstacles. Beginning with an in-depth exploration of the evolution of visual SLAM, this study outlines its strengths and unique characteristics, while also critically assessing previous survey literature. Subsequently, a unified problem formulation and evaluation of the modular solution framework is proposed, which divides the problem into discrete stages, including visual localization, semantic feature extraction, mapping, data association, and loop closure optimization. Moreover, this study investigates alternative methodologies such as deep learning and the utilization of large language models, alongside a review of relevant research about contemporary SLAM datasets. Concluding with a discussion on potential future research directions, this study serves as a comprehensive resource for researchers seeking to navigate the complex landscape of Semantic SLAM.
Problem

Research questions and friction points this paper is trying to address.

Surveying state-of-the-art Semantic SLAM techniques and challenges
Providing a unified problem formulation and modular solution framework
Identifying future research directions for Semantic SLAM development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified modular framework divides SLAM into stages
Integrates deep learning for semantic feature extraction
Utilizes large language models for data association
🔎 Similar Papers
No similar papers found.
T
Thanh Nguyen Canh
School of Information Science, Japan Advanced Institute of Science and Technology, Nomi 923-1292, Japan
H
Haolan Zhang
School of Information Science, Japan Advanced Institute of Science and Technology, Nomi 923-1292, Japan
X
Xiem HoangVan
Vietnam National University, University of Engineering and Technology, Hanoi 10000, Vietnam
Nak Young Chong
Nak Young Chong
Professor of Information Science, JAIST
Robotics