🤖 AI Summary
To address multi-robot collaborative perception under communication constraints, this paper proposes COGraph—a lightweight, graph-structured 3D semantic map enabling open-vocabulary querying and cross-robot online fusion. Methodologically, we introduce a novel data-driven encoder-decoder mechanism for compact COGraph representation; design feature-level place recognition and pose estimation, eliminating the need for geometric registration during semantic map fusion; and formulate 3D scene graph modeling that jointly integrates semantic feature compression, feature-matching–driven pose estimation, and distributed fusion. Compared to transmitting raw semantic point clouds or 512-dimensional COGraph embeddings, our approach reduces communication overhead by two orders of magnitude while preserving equivalent open-vocabulary query accuracy and mapping completeness. This significantly enhances collaborative efficiency and energy efficiency in bandwidth-limited environments.
📝 Abstract
Collaborative perception in unknown environments is crucial for multi-robot systems. With the emergence of foundation models, robots can now not only perceive geometric information but also achieve open-vocabulary scene understanding. However, existing map representations that support open-vocabulary queries often involve large data volumes, which becomes a bottleneck for multi-robot transmission in communication-limited environments. To address this challenge, we develop a method to construct a graph-structured 3D representation called COGraph, where nodes represent objects with semantic features and edges capture their spatial relationships. Before transmission, a data-driven feature encoder is applied to compress the feature dimensions of the COGraph. Upon receiving COGraphs from other robots, the semantic features of each node are recovered using a decoder. We also propose a feature-based approach for place recognition and translation estimation, enabling the merging of local COGraphs into a unified global map. We validate our framework using simulation environments built on Isaac Sim and real-world datasets. The results demonstrate that, compared to transmitting semantic point clouds and 512-dimensional COGraphs, our framework can reduce the data volume by two orders of magnitude, without compromising mapping and query performance. For more details, please visit our website at https://github.com/efc-robot/MR-COGraphs.