SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

📅 2024-11-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing Gaussian splatting methods for dynamic urban scenes suffer from reliance on costly manual 3D bounding box annotations and poor scalability. To address this, we propose the first self-supervised dynamic Gaussian splatting framework that enables 4D spatiotemporal scene reconstruction and novel-view synthesis of RGB, depth, and optical flow—without object tracking or manual 3D bounding box supervision. Our key contributions are: (1) a Neural Motion Flow Field (NMFF) that implicitly models temporal correspondences among Gaussians, explicitly decoupling static and dynamic components; (2) the first distillation of 2D foundation model features into 4D Gaussian representations to enhance dynamic object recognition; and (3) joint LiDAR-visual representation learning with self-supervised spatiotemporal consistency constraints. Evaluated on Waymo and KITTI, our method achieves state-of-the-art performance in novel-view synthesis and significantly improves dynamic modeling accuracy and cross-view consistency.

Technology Category

Application Category

📝 Abstract

Most existing Dynamic Gaussian Splatting methods for complex dynamic urban scenarios rely on accurate object-level supervision from expensive manual labeling, limiting their scalability in real-world applications. In this paper, we introduce SplatFlow, a Self-Supervised Dynamic Gaussian Splatting within Neural Motion Flow Fields (NMFF) to learn 4D space-time representations without requiring tracked 3D bounding boxes, enabling accurate dynamic scene reconstruction and novel view RGB/depth/flow synthesis. SplatFlow designs a unified framework to seamlessly integrate time-dependent 4D Gaussian representation within NMFF, where NMFF is a set of implicit functions to model temporal motions of both LiDAR points and Gaussians as continuous motion flow fields. Leveraging NMFF, SplatFlow effectively decomposes static background and dynamic objects, representing them with 3D and 4D Gaussian primitives, respectively. NMFF also models the correspondences of each 4D Gaussian across time, which aggregates temporal features to enhance cross-view consistency of dynamic components. SplatFlow further improves dynamic object identification by distilling features from 2D foundation models into 4D space-time representation. Comprehensive evaluations conducted on the Waymo and KITTI Datasets validate SplatFlow's state-of-the-art (SOTA) performance for both image reconstruction and novel view synthesis in dynamic urban scenarios.

Problem

Research questions and friction points this paper is trying to address.

Self-supervised 4D scene representation without manual labels

Dynamic object separation via neural motion flow fields

Enhancing cross-view consistency in dynamic urban scenes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised Dynamic Gaussian Splatting in NMFF

Unified 4D Gaussian representation in motion fields

Feature distillation from 2D models to 4D space

🔎 Similar Papers

No similar papers found.

Authors to Follow