🤖 AI Summary
Traditional semantic SLAM systems are constrained by predefined object categories, limiting their applicability to open-vocabulary, free-viewpoint downstream tasks and forcing a trade-off between real-time performance and map fidelity. This work proposes the first real-time SLAM system that integrates 3D Gaussian Splatting (3DGS) with vision foundation models, embedding open-set semantics into dynamic mapping and tracking through dense feature rasterization. The method achieves, for the first time in real-time SLAM, open-vocabulary, free-viewpoint semantic representation, thereby overcoming the limitations of fixed-category assumptions. Experimental results demonstrate that the system reduces pose error by 9% and improves mapping accuracy by 8% on standard benchmarks, while maintaining real-time performance and delivering semantic and language mask quality comparable to offline models.
📝 Abstract
We present a real-time tracking SLAM system that unifies efficient camera tracking with photorealistic feature-enriched mapping using 3D Gaussian Splatting (3DGS). Our main contribution is integrating dense feature rasterization into the novel-view synthesis, aligned with a visual foundation model. This yields strong semantics, going beyond basic RGB-D input, aiding both tracking and mapping accuracy. Unlike previous semantic SLAM approaches (which embed pre-defined class labels) FeatureSLAM enables entirely new downstream tasks via free-viewpoint, open-set segmentation. Across standard benchmarks, our method achieves real-time tracking, on par with state-of-the-art systems while improving tracking stability and map fidelity without prohibitive compute. Quantitatively, we obtain 9\% lower pose error and 8\% higher mapping accuracy compared to recent fixed-set SLAM baselines. Our results confirm that real-time feature-embedded SLAM, is not only valuable for enabling new downstream applications. It also improves the performance of the underlying tracking and mapping subsystems, providing semantic and language masking results that are on-par with offline 3DGS models, alongside state-of-the-art tracking, depth and RGB rendering.