🤖 AI Summary
Existing NeRF-SLAM methods for dynamic scenes rely either on the static-scene assumption or ground-truth camera poses, limiting their applicability in real-world dynamic environments.
Method: This paper proposes a time-varying neural radiance field SLAM framework. (1) It introduces motion-mask-guided self-supervised image sampling to enable pose-free camera tracking; (2) it constructs a spatiotemporally coupled deformation field jointly with SDF and color prediction networks, enabling staged co-optimization of dynamic deformations and appearance; (3) it designs a time-aware two-stage parameter optimization mechanism and an overlap-rate-driven keyframe selection strategy.
Contribution/Results: To our knowledge, this is the first fully self-supervised, non-static-assumption end-to-end NeRF-SLAM framework. It achieves state-of-the-art performance on two synthetic dynamic datasets and one real-world dynamic dataset, significantly improving dynamic object reconstruction accuracy and tracking robustness.
📝 Abstract
Previous attempts to integrate Neural Radiance Fields (NeRF) into the Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or require the ground truth camera poses, which impedes their application in real-world scenarios. This paper proposes a time-varying representation to track and reconstruct the dynamic scenes. Firstly, two processes, a tracking process and a mapping process, are maintained simultaneously in our framework. In the tracking process, all input images are uniformly sampled and then progressively trained in a self-supervised paradigm. In the mapping process, we leverage motion masks to distinguish dynamic objects from the static background, and sample more pixels from dynamic areas. Secondly, the parameter optimization for both processes is comprised of two stages: the first stage associates time with 3D positions to convert the deformation field to the canonical field. The second stage associates time with the embeddings of the canonical field to obtain colors and a Signed Distance Function (SDF). Lastly, we propose a novel keyframe selection strategy based on the overlapping rate. Our approach is evaluated on two synthetic datasets and one real-world dataset, and the experiments validate that our method achieves competitive results in both tracking and mapping when compared to existing state-of-the-art NeRF-based dynamic SLAM systems.