🤖 AI Summary
Existing 3D scene graphs employ overly simplified geometric representations of objects, which hinder high-fidelity reconstruction, robust relocalization, and safe navigation. To address this limitation, this work proposes a four-level hierarchical object representation derived from RGB-D inputs, progressively constructing point clouds and dense meshes, and—critically—introducing analytical superquadrics into the scene graph for the first time. This formulation yields sparse, geometry-preserving, and differentiable object models that enable efficient collision detection and map alignment within an end-to-end perception pipeline. Experiments on multiple indoor and outdoor datasets, including HOPE and ReplicaCAD, demonstrate that superquadric-based map alignment consistently outperforms the current state-of-the-art method, ROMAN.
📝 Abstract
Hierarchical 3D Scene Graphs (3DSG) have emerged as an actionable and scalable representation for long-term autonomy incorporating metric, semantic, and topological information in the scene. However, the question of geometric representation of objects in 3DSG has been overlooked as most methods use simplified geometric models such as partial point clouds or 3D bounding boxes. In this work, we introduce a hierarchical object representation that can be leveraged for high-fidelity object-level reconstruction, object-based robust re-localization or map alignment, and efficient and analytical collision checking for safe robot navigation planning in dense and cluttered environments. The representation is structurally organized into four distinct layers, progressively abstracting the scene from raw sensor data to dense 3D meshes to analytical primitives such as superquadrics, which provide a sparse and analytical representation for object geometry. We develop a pipeline that builds the hierarchical object representation from RGB-D image stream captured by a robot, and demonstrate its working in real-world open-set object scenes in both indoor and outdoor environments. Extensive experiments across diverse datasets including HOPE, ReplicaCAD, Kimera-Multi, and NUS Campus Dataset collected using Unitree B2 Robot validate our pipeline in both indoor and outdoor environments. We show that our superquadric-based map alignment method outperforms the current state-of-the-art object based map alignment method ROMAN. Our code can be found at https://github.com/perceptica-robotics/Hickory.