π€ AI Summary
In dynamic driving scenes, self-supervised 3D reconstruction and novel-view synthesis suffer from inaccurate motion estimation, temporal inconsistency, and incomplete reconstruction of dynamic objects. To address these challenges, this paper proposes the first 4D radar-enhanced self-supervised framework. Our key contributions are: (1) a Gaussian point cloud initialization method leveraging 4D radarβs velocity and spatial measurements to enable precise dynamic object segmentation and metric depth recovery; and (2) a velocity-guided fine-grained trajectory tracking model that jointly optimizes differentiable rendering and point cloud motion under scene flow supervision, ensuring strong temporal consistency. Evaluated on the OmniHD-Scenes dataset, our method significantly improves dynamic object completeness and inter-frame consistency, achieving state-of-the-art performance for dynamic driving scene reconstruction.
π Abstract
3D reconstruction and novel view synthesis are critical for validating autonomous driving systems and training advanced perception models. Recent self-supervised methods have gained significant attention due to their cost-effectiveness and enhanced generalization in scenarios where annotated bounding boxes are unavailable. However, existing approaches, which often rely on frequency-domain decoupling or optical flow, struggle to accurately reconstruct dynamic objects due to imprecise motion estimation and weak temporal consistency, resulting in incomplete or distorted representations of dynamic scene elements. To address these challenges, we propose 4DRadar-GS, a 4D Radar-augmented self-supervised 3D reconstruction framework tailored for dynamic driving scenes. Specifically, we first present a 4D Radar-assisted Gaussian initialization scheme that leverages 4D Radar's velocity and spatial information to segment dynamic objects and recover monocular depth scale, generating accurate Gaussian point representations. In addition, we propose a Velocity-guided PointTrack (VGPT) model, which is jointly trained with the reconstruction pipeline under scene flow supervision, to track fine-grained dynamic trajectories and construct temporally consistent representations. Evaluated on the OmniHD-Scenes dataset, 4DRadar-GS achieves state-of-the-art performance in dynamic driving scene 3D reconstruction.