🤖 AI Summary
This work addresses the challenging problem of reconstructing static scene backgrounds from monocular video under dynamic object interference, without prior knowledge of camera poses. To this end, we propose a motion-mask-guided dynamic-aware Gaussian optimization framework. Our method integrates a lightweight motion segmentation network with dynamic-aware Gaussian splatting modeling, employing a motion-mask-weighted optimization strategy to explicitly decouple foreground motion from background static geometry—eliminating reliance on camera pose estimates, SLAM point clouds, or geometric priors. Evaluated on DAVIS and Sintel benchmarks, our approach achieves over 2 dB PSNR improvement over state-of-the-art methods that assume no moving objects, significantly enhancing reconstruction robustness and accuracy in highly dynamic scenes. To the best of our knowledge, this is the first method to achieve high-fidelity Gaussian reconstruction of static scenes without any 3D or pose supervision.
📝 Abstract
We propose a novel framework for scene decomposition and static background reconstruction from everyday videos. By integrating the trained motion masks and modeling the static scene as Gaussian splats with dynamics-aware optimization, our method achieves more accurate background reconstruction results than previous works. Our proposed method is termed DAS3R, an abbreviation for Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction. Compared to existing methods, DAS3R is more robust in complex motion scenarios, capable of handling videos where dynamic objects occupy a significant portion of the scene, and does not require camera pose inputs or point cloud data from SLAM-based methods. We compared DAS3R against recent distractor-free approaches on the DAVIS and Sintel datasets; DAS3R demonstrates enhanced performance and robustness with a margin of more than 2 dB in PSNR. The project's webpage can be accessed via url{https://kai422.github.io/DAS3R/}