🤖 AI Summary
To address the challenge of maintaining a consistent scene representation in dynamic environments—where conventional SLAM methods struggle—this paper proposes a hierarchical 3D scene graph SLAM framework. The method explicitly models dynamic entities (e.g., moving objects and agents) within SLAM for the first time and introduces a layered constraint mechanism. Key components include fiducial marker–based dynamic entity detection, dual geometric constraints (entity-to-keyframe and intra-entity), and joint semantic-geometric graph optimization. The framework enables simultaneous estimation of dynamic object poses and robot trajectory, while supporting high-level reasoning about scene dynamics. Experimental results demonstrate a 27.57% reduction in pose estimation error, significantly improving robustness in dynamic mapping and accuracy in dynamic object state estimation.
📝 Abstract
Autonomous robots depend crucially on their ability to perceive and process information from dynamic, ever-changing environments. Traditional simultaneous localization and mapping (SLAM) approaches struggle to maintain consistent scene representations because of numerous moving objects, often treating dynamic elements as outliers rather than explicitly modeling them in the scene representation. In this paper, we present a novel hierarchical 3D scene graph-based SLAM framework that addresses the challenge of modeling and estimating the pose of dynamic objects and agents. We use fiducial markers to detect dynamic entities and to extract their attributes while improving keyframe selection and implementing new capabilities for dynamic entity mapping. We maintain a hierarchical representation where dynamic objects are registered in the SLAM graph and are constrained with robot keyframes and the floor level of the building with our novel entity-keyframe constraints and intra-entity constraints. By combining semantic and geometric constraints between dynamic entities and the environment, our system jointly optimizes the SLAM graph to estimate the pose of the robot and various dynamic agents and objects while maintaining an accurate map. Experimental evaluation demonstrates that our approach achieves a 27.57% reduction in pose estimation error compared to traditional methods and enables higher-level reasoning about scene dynamics.