🤖 AI Summary
To address the insufficient robustness of scene flow estimation under adverse weather conditions, this paper proposes the first joint 4D millimeter-wave radar and LiDAR point cloud modeling framework—tackling key challenges including high radar noise, low resolution, extreme sparsity, and the absence of dedicated scene flow datasets. Methodologically, we design a Dynamic-aware Bidirectional Cross-modal Fusion (DBCF) module, integrating local cross-attention, radar denoising preprocessing, and a multi-objective loss function with dynamic region-wise consistency constraints. Our contributions are: (1) the first real-world radar–LiDAR scene flow dataset, accompanied by a reliable radar flow annotation strategy; and (2) the first end-to-end radar–LiDAR joint scene flow learning framework. Experiments demonstrate that our method significantly outperforms unimodal baselines on the proposed dataset, particularly improving flow accuracy and instance-level consistency in dynamic foreground regions.
📝 Abstract
Recent multimodal fusion methods, integrating images with LiDAR point clouds, have shown promise in scene flow estimation. However, the fusion of 4D millimeter wave radar and LiDAR remains unexplored. Unlike LiDAR, radar is cheaper, more robust in various weather conditions and can detect point-wise velocity, making it a valuable complement to LiDAR. However, radar inputs pose challenges due to noise, low resolution, and sparsity. Moreover, there is currently no dataset that combines LiDAR and radar data specifically for scene flow estimation. To address this gap, we construct a Radar-LiDAR scene flow dataset based on a public real-world automotive dataset. We propose an effective preprocessing strategy for radar denoising and scene flow label generation, deriving more reliable flow ground truth for radar points out of the object boundaries. Additionally, we introduce RaLiFlow, the first joint scene flow learning framework for 4D radar and LiDAR, which achieves effective radar-LiDAR fusion through a novel Dynamic-aware Bidirectional Cross-modal Fusion (DBCF) module and a carefully designed set of loss functions. The DBCF module integrates dynamic cues from radar into the local cross-attention mechanism, enabling the propagation of contextual information across modalities. Meanwhile, the proposed loss functions mitigate the adverse effects of unreliable radar data during training and enhance the instance-level consistency in scene flow predictions from both modalities, particularly for dynamic foreground areas. Extensive experiments on the repurposed scene flow dataset demonstrate that our method outperforms existing LiDAR-based and radar-based single-modal methods by a significant margin.