CoMo3R-SLAM: Collaborative Monocular Dense SLAM with Learned 3D Reconstruction Priors for Outdoor Multi-Agent Systems

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenges of collaborative dense monocular SLAM in outdoor multi-agent settings—specifically scale ambiguity, unreliable cross-agent data association, and interference from repetitive structures—by proposing the first system that integrates learned 3D reconstruction priors. Each agent leverages this prior to enable robust real-time tracking and local dense fusion in the front-end, while a central coordinator constructs a globally consistent metric map through dense point cloud matching, closed-form Sim(3) pose synchronization, and GPU-accelerated global bundle adjustment. Requiring neither depth sensors nor precise camera intrinsics, the method achieves state-of-the-art accuracy on the Tanks and Temples and Waymo datasets: it attains the lowest ATE in three Tanks and Temples scenes and matches or surpasses existing RGB-D methods on Waymo, all while supporting online operation at 8 FPS.

📝 Abstract

Collaborative dense SLAM is essential for multi-robot teams to achieve scalable and consistent 3D perception across large-scale outdoor environments. Existing systems typically depend on depth sensors, incurring significant payload, power, and calibration costs. Monocular RGB cameras are a lightweight alternative, but collaborative monocular dense SLAM remains difficult due to scale ambiguity, unreliable inter-agent data association, especially in outdoor scenes where low overlap and repetitive structures make traditional feature matching unreliable, motivating robust geometric information. We propose CoMo3R-SLAM, the first collaborative monocular dense RGB SLAM system that leverages robust learned feed-forward 3D reconstruction priors for outdoor multi-agent mapping. Each agent runs a prior-guided front-end for real-time tracking and local dense fusion, while a coordinator performs dense pointmap matching for cross-agent verification, closed-form Sim(3) gauge synchronization, and GPU-accelerated global bundle adjustment with segment-level depth optimization. Requiring neither depth sensors nor parametric intrinsics, our system produces robust cross-agent constraints and globally consistent metric maps from monocular RGB alone. On Tanks and Temples and Waymo sequences, CoMo3R-SLAM achieves the best ATE on three of four Tanks and Temples scenes and competitive Waymo accuracy, matching or exceeding state-of-the-art RGB-D methods while running online at 8 FPS.

Problem

Research questions and friction points this paper is trying to address.

collaborative SLAM

monocular dense SLAM

scale ambiguity

inter-agent data association

outdoor multi-agent systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

collaborative SLAM

monocular dense reconstruction

learned 3D priors