MS-RAFT-3D: A Multi-Scale Architecture for Recurrent Image-Based Scene Flow

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of multi-scale modeling paradigms in image-based scene flow estimation. We pioneer the adaptation of the successful multi-scale recurrent architecture from optical flow to scene flow estimation. Our end-to-end coarse-to-fine hierarchical framework comprises: (1) a multi-scale feature and context encoder tailored for 3D motion modeling; (2) an optical-flow-guided iterative refinement mechanism operating across hierarchy levels; and (3) a hierarchical loss function jointly enforcing geometric and photometric consistency. Built upon the RAFT architecture, our method integrates a multi-scale feature pyramid with cross-scale feature interaction. On the KITTI and Spring benchmarks, it achieves new state-of-the-art performance—improving accuracy by 8.7% and 65.8%, respectively—demonstrating significant gains in both precision and generalization. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Although multi-scale concepts have recently proven useful for recurrent network architectures in the field of optical flow and stereo, they have not been considered for image-based scene flow so far. Hence, based on a single-scale recurrent scene flow backbone, we develop a multi-scale approach that generalizes successful hierarchical ideas from optical flow to image-based scene flow. By considering suitable concepts for the feature and the context encoder, the overall coarse-to-fine framework and the training loss, we succeed to design a scene flow approach that outperforms the current state of the art on KITTI and Spring by 8.7%(3.89 vs. 4.26) and 65.8% (9.13 vs. 26.71), respectively. Our code is available at https://github.com/cv-stuttgart/MS-RAFT-3D.
Problem

Research questions and friction points this paper is trying to address.

Develops multi-scale architecture for image-based scene flow
Generalizes hierarchical ideas from optical flow to scene flow
Outperforms state-of-the-art on KITTI and Spring benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale recurrent architecture for scene flow
Hierarchical coarse-to-fine framework design
Improved feature and context encoder concepts
🔎 Similar Papers
No similar papers found.
J
Jakob Schmid
Computer Vision Group, Institute for Visualization and Interactive Systems, University of Stuttgart
Azin Jahedi
Azin Jahedi
PhD Student, Universtity of Stuttgart
Computer VisionMachine Learning
Noah Berenguel Senn
Noah Berenguel Senn
PhD Student, University of Stuttgart
Computer VisionOptical FlowMachine Learning
A
Andrés Bruhn
Computer Vision Group, Institute for Visualization and Interactive Systems, University of Stuttgart