MS-RAFT-3D: A Multi-Scale Architecture for Recurrent Image-Based Scene Flow

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the lack of multi-scale modeling paradigms in image-based scene flow estimation. We pioneer the adaptation of the successful multi-scale recurrent architecture from optical flow to scene flow estimation. Our end-to-end coarse-to-fine hierarchical framework comprises: (1) a multi-scale feature and context encoder tailored for 3D motion modeling; (2) an optical-flow-guided iterative refinement mechanism operating across hierarchy levels; and (3) a hierarchical loss function jointly enforcing geometric and photometric consistency. Built upon the RAFT architecture, our method integrates a multi-scale feature pyramid with cross-scale feature interaction. On the KITTI and Spring benchmarks, it achieves new state-of-the-art performance—improving accuracy by 8.7% and 65.8%, respectively—demonstrating significant gains in both precision and generalization. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

Although multi-scale concepts have recently proven useful for recurrent network architectures in the field of optical flow and stereo, they have not been considered for image-based scene flow so far. Hence, based on a single-scale recurrent scene flow backbone, we develop a multi-scale approach that generalizes successful hierarchical ideas from optical flow to image-based scene flow. By considering suitable concepts for the feature and the context encoder, the overall coarse-to-fine framework and the training loss, we succeed to design a scene flow approach that outperforms the current state of the art on KITTI and Spring by 8.7%(3.89 vs. 4.26) and 65.8% (9.13 vs. 26.71), respectively. Our code is available at https://github.com/cv-stuttgart/MS-RAFT-3D.

Problem

Research questions and friction points this paper is trying to address.

Develops multi-scale architecture for image-based scene flow

Generalizes hierarchical ideas from optical flow to scene flow

Outperforms state-of-the-art on KITTI and Spring benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale recurrent architecture for scene flow

Hierarchical coarse-to-fine framework design

Improved feature and context encoder concepts

🔎 Similar Papers

No similar papers found.