Anchor3R: Streaming 3D Reconstruction with Transient Anchors for Long-Horizon Visual Mapping

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the challenges of long-term online visual mapping—namely, train-test mismatch due to a fixed global coordinate frame, attention bias toward early anchor points, and trajectory drift—by proposing a current-frame-centric, streaming feedforward reconstruction framework. The method abandons the global coordinate system entirely and instead introduces a transient anchor mechanism that reformulates 3D reconstruction as relative pose estimation and point cloud mapping within a local temporal window. Global consistency is preserved through integrated loop closure detection and motion averaging. Evaluated across diverse indoor and outdoor RGB-D and driving datasets, the approach substantially improves pose accuracy and dense reconstruction quality over long sequences while enabling low-memory online inference.

📝 Abstract

Long-horizon online visual mapping is a core capability for robot perception, requiring continuous camera-motion and scene-geometry estimation from visual streams under bounded memory and computation. Recent feed-forward 3D reconstruction models provide strong geometric priors, but their streaming variants often predict poses in a fixed coordinate system tied to the first frame or a persistent scene memory. This fixed-gauge design leads to train--test mismatch, attention bias toward early anchors, and accumulated drift on sequences much longer than those seen during training. We propose \emph{Anchor3R}, a streaming 3D reconstruction framework that treats feed-forward reconstruction as current-centric local measurement prediction rather than persistent global-gauge regression. At each time step, Anchor3R predicts window-relative poses and a local pointmap in the current-frame coordinate system, turning streaming reconstruction into relative-pose measurement generation. These measurements support online pose updates, while loop-closure reinsertion and motion averaging align the trajectory and transform local pointmaps into a coherent global reconstruction. Experiments on indoor, outdoor, driving, and RGB-D benchmarks show that Anchor3R improves long-horizon pose accuracy and dense reconstruction quality over existing streaming baselines, while supporting bounded-memory online inference.

Problem

Research questions and friction points this paper is trying to address.

long-horizon visual mapping

streaming 3D reconstruction

pose drift

fixed coordinate system

online inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

streaming 3D reconstruction

transient anchors

relative pose estimation