4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the ill-posed problem of jointly optimizing geometry, appearance, dynamics, and camera pose in RGB-D video sequences of non-rigid scenes. We propose the first real-time 4D-SLAM method for non-rigid scenarios, built upon dynamic Gaussian surfaces and differentiable rendering. A spatiotemporal deformation field—parameterized by an MLP—models non-rigid motion, while a depth-guided online joint optimization framework enables robust estimation. We further introduce a novel camera pose estimation strategy and spatiotemporal regularization to ensure geometric consistency. Our contributions are threefold: (1) the first Gaussian-surface-based SLAM framework; (2) the first open-source synthetic 4D benchmark dataset covering diverse non-rigid motion patterns; and (3) a standardized evaluation protocol for 4D-SLAM. Experiments demonstrate that our method significantly outperforms state-of-the-art 3D and 4D approaches in both accuracy and real-time performance.

Technology Category

Application Category

📝 Abstract

We propose the first 4D tracking and mapping method that jointly performs camera localization and non-rigid surface reconstruction via differentiable rendering. Our approach captures 4D scenes from an online stream of color images with depth measurements or predictions by jointly optimizing scene geometry, appearance, dynamics, and camera ego-motion. Although natural environments exhibit complex non-rigid motions, 4D-SLAM remains relatively underexplored due to its inherent challenges; even with 2.5D signals, the problem is ill-posed because of the high dimensionality of the optimization space. To overcome these challenges, we first introduce a SLAM method based on Gaussian surface primitives that leverages depth signals more effectively than 3D Gaussians, thereby achieving accurate surface reconstruction. To further model non-rigid deformations, we employ a warp-field represented by a multi-layer perceptron (MLP) and introduce a novel camera pose estimation technique along with surface regularization terms that facilitate spatio-temporal reconstruction. In addition to these algorithmic challenges, a significant hurdle in 4D SLAM research is the lack of reliable ground truth and evaluation protocols, primarily due to the difficulty of 4D capture using commodity sensors. To address this, we present a novel open synthetic dataset of everyday objects with diverse motions, leveraging large-scale object models and animation modeling. In summary, we open up the modern 4D-SLAM research by introducing a novel method and evaluation protocols grounded in modern vision and rendering techniques.

Problem

Research questions and friction points this paper is trying to address.

Joint camera localization and non-rigid surface reconstruction

High-dimensional optimization for 4D scene dynamics

Lack of reliable ground truth in 4D SLAM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian surface primitives for SLAM

Employs MLP warp-field for non-rigid deformations

Introduces synthetic dataset for 4D evaluation

🔎 Similar Papers

No similar papers found.

Authors to Follow