SF3D-RGB: Scene Flow Estimation from Monocular Camera and Sparse LiDAR

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work proposes a lightweight, end-to-end multimodal deep learning architecture to address the limited robustness of monomodal approaches in scene flow estimation. For the first time, it effectively integrates monocular RGB images with sparse LiDAR point clouds through a unified framework. The method employs multimodal feature encoding and fusion, followed by graph matching to generate an initial scene flow estimate, and further refines accuracy via a residual optimization module. Extensive experiments on real-world datasets demonstrate that the proposed approach significantly outperforms existing monomodal methods despite using fewer parameters, achieving state-of-the-art performance in both accuracy and robustness.

Technology Category

Application Category

📝 Abstract

Scene flow estimation is an extremely important task in computer vision to support the perception of dynamic changes in the scene. For robust scene flow, learning-based approaches have recently achieved impressive results using either image-based or LiDAR-based modalities. However, these methods have tended to focus on the use of a single modality. To tackle these problems, we present a deep learning architecture, SF3D-RGB, that enables sparse scene flow estimation using 2D monocular images and 3D point clouds (e.g., acquired by LiDAR) as inputs. Our architecture is an end-to-end model that first encodes information from each modality into features and fuses them together. Then, the fused features enhance a graph matching module for better and more robust mapping matrix computation to generate an initial scene flow. Finally, a residual scene flow module further refines the initial scene flow. Our model is designed to strike a balance between accuracy and efficiency. Furthermore, experiments show that our proposed method outperforms single-modality methods and achieves better scene flow accuracy on real-world datasets while using fewer parameters compared to other state-of-the-art methods with fusion.

Problem

Research questions and friction points this paper is trying to address.

scene flow estimation

monocular camera

sparse LiDAR

multi-modal fusion

dynamic scene perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

scene flow estimation

multimodal fusion

monocular RGB