SLAM-Former: Putting SLAM into One Transformer

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of unifying full SLAM functionality—front-end tracking, incremental mapping, and back-end global optimization—within a single end-to-end learnable architecture. We propose the first neural SLAM framework that integrates all these components into a unified Transformer model. By serializing monocular video streams as spatiotemporal sequences, the model jointly and iteratively optimizes camera poses and dense depth maps, enabling geometrically consistent, tightly coupled reconstruction. Key contributions include: (i) the first holistic integration of complete SLAM pipeline within a single Transformer, eliminating conventional modular design; and (ii) novel mechanisms for incremental feature updating and cross-frame joint pose-depth optimization. Evaluated on multiple standard benchmarks, our method matches or surpasses state-of-the-art dense SLAM approaches in accuracy and robustness, with particularly notable improvements in dynamic scenes and long-duration sequences.

Technology Category

Application Category

📝 Abstract
We present SLAM-Former, a novel neural approach that integrates full SLAM capabilities into a single transformer. Similar to traditional SLAM systems, SLAM-Former comprises both a frontend and a backend that operate in tandem. The frontend processes sequential monocular images in real-time for incremental mapping and tracking, while the backend performs global refinement to ensure a geometrically consistent result. This alternating execution allows the frontend and backend to mutually promote one another, enhancing overall system performance. Comprehensive experimental results demonstrate that SLAM-Former achieves superior or highly competitive performance compared to state-of-the-art dense SLAM methods.
Problem

Research questions and friction points this paper is trying to address.

Integrating full SLAM capabilities into a single transformer model
Enabling real-time incremental mapping and tracking from images
Performing global refinement for geometrically consistent results
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates full SLAM capabilities into one transformer
Frontend processes images for real-time tracking and mapping
Backend performs global refinement for geometric consistency
🔎 Similar Papers
No similar papers found.
Yijun Yuan
Yijun Yuan
Tsinghua University, China
Robotic MappingSLAMRescue Robotics
Z
Zhuoguang Chen
IIIS, Tsinghua University
Kenan Li
Kenan Li
Assistant Professor, Saint Louis University
public healthGISspatial statisticssystem dynamicsgeo-AI
W
Weibang Wang
IIIS, Tsinghua University
H
Hang Zhao
IIIS, Tsinghua University