Channel-wise Motion Features for Efficient Motion Segmentation

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Real-time motion segmentation for safety-critical applications such as autonomous driving remains challenging, as existing approaches rely on multi-branch architectures—jointly modeling depth, optical flow, and scene flow—resulting in high computational overhead and poor real-time performance. Method: This paper proposes a lightweight, class-agnostic dynamic object detection framework that eliminates all auxiliary geometric or motion estimation subnetworks. Instead, it leverages a single-branch pose estimation network to generate channel-wise motion features and implicitly encodes motion information in 3D space via a cost-volume-based feature representation, integrated into an end-to-end motion segmentation architecture. Contribution/Results: Evaluated on KITTI and VCAS-Motion, the method achieves approximately 4× higher FPS than state-of-the-art models, reduces parameter count by 75%, and maintains competitive segmentation accuracy—significantly improving both efficiency and practical deployability.

Technology Category

Application Category

📝 Abstract

For safety-critical robotics applications such as autonomous driving, it is important to detect all required objects accurately in real-time. Motion segmentation offers a solution by identifying dynamic objects from the scene in a class-agnostic manner. Recently, various motion segmentation models have been proposed, most of which jointly use subnetworks to estimate Depth, Pose, Optical Flow, and Scene Flow. As a result, the overall computational cost of the model increases, hindering real-time performance. In this paper, we propose a novel cost-volume-based motion feature representation, Channel-wise Motion Features. By extracting depth features of each instance in the feature map and capturing the scene's 3D motion information, it offers enhanced efficiency. The only subnetwork used to build Channel-wise Motion Features is the Pose Network, and no others are required. Our method not only achieves about 4 times the FPS of state-of-the-art models in the KITTI Dataset and Cityscapes of the VCAS-Motion Dataset, but also demonstrates equivalent accuracy while reducing the parameters to about 25$%$.

Problem

Research questions and friction points this paper is trying to address.

Detect dynamic objects efficiently for autonomous driving

Reduce computational cost in motion segmentation models

Achieve real-time performance without sacrificing accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Channel-wise Motion Features for efficiency

Uses only Pose Network subnetwork

Reduces parameters while maintaining accuracy

🔎 Similar Papers

No similar papers found.