🤖 AI Summary
Existing dynamic navigation methods suffer from a disconnection between semantic and temporal modeling: 3D scene graphs (3DSGs) lack explicit representation of environmental dynamics, while maps of dynamics (MoDs) rely on grid-based representations that are semantically impoverished and poorly scalable. This paper introduces a hierarchical 4D scene graph, the first to end-to-end embed temporal flow dynamics directly into 3DSG navigation nodes, unifying semantic structure, geometric relations, and dynamic evolution. Key contributions include: (1) a graph-structured sparse motion flow representation, overcoming MoD’s semantic and scalability limitations; and (2) a spatiotemporal node association mechanism enabling efficient reasoning. Evaluated on real-world urban driving and indoor robot benchmarks, our method significantly improves path planning plausibility and human-robot interaction safety, reduces prediction error by 37%, and accelerates inference by 2.1×.
📝 Abstract
Autonomous navigation in dynamic environments requires spatial representations that capture both semantic structure and temporal evolution. 3D Scene Graphs (3DSGs) provide hierarchical multi-resolution abstractions that encode geometry and semantics, but existing extensions toward dynamics largely focus on individual objects or agents. In parallel, Maps of Dynamics (MoDs) model typical motion patterns and temporal regularities, yet are usually tied to grid-based discretizations that lack semantic awareness and do not scale well to large environments. In this paper we introduce Aion, a framework that embeds temporal flow dynamics directly within a hierarchical 3DSG, effectively incorporating the temporal dimension. Aion employs a graph-based sparse MoD representation to capture motion flows over arbitrary time intervals and attaches them to navigational nodes in the scene graph, yielding more interpretable and scalable predictions that improve planning and interaction in complex dynamic environments.