MoCrop: Training Free Motion Guided Cropping for Efficient Video Action Recognition

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low efficiency and high computational cost of motion information utilization in video action recognition, this paper proposes MoCrop—a training-free, parameter-free motion-aware adaptive cropping method. MoCrop directly leverages motion vectors from the H.264 compressed domain and achieves clip-level spatial focusing on I-frames through motion density modeling, denoising fusion (DM), Monte Carlo sampling (MCS), and adaptive cropping (AC). It is fully compatible with mainstream backbone networks and introduces no additional training or inference parameters. On UCF101, ResNet-50 augmented with MoCrop achieves a 3.5% Top-1 accuracy gain at comparable computational cost, or a 2.4% gain with 26.5% fewer FLOPs. CoViAR with MoCrop maintains 88.5% accuracy while reducing computation by 26.7%, significantly improving the accuracy–efficiency trade-off.

Technology Category

Application Category

📝 Abstract
We introduce MoCrop, a motion-aware adaptive cropping module for efficient video action recognition in the compressed domain. MoCrop uses motion vectors that are available in H.264 video to locate motion-dense regions and produces a single clip-level crop that is applied to all I-frames at inference. The module is training free, adds no parameters, and can be plugged into diverse backbones. A lightweight pipeline that includes denoising & merge (DM), Monte Carlo sampling (MCS), and adaptive cropping (AC) via a motion-density submatrix search yields robust crops with negligible overhead. On UCF101, MoCrop improves accuracy or reduces compute. With ResNet-50, it delivers +3.5% Top-1 accuracy at equal FLOPs (attention setting), or +2.4% Top-1 accuracy with 26.5% fewer FLOPs (efficiency setting). Applied to CoViAR, it reaches 89.2% Top-1 accuracy at the original cost and 88.5% Top-1 accuracy while reducing compute from 11.6 to 8.5 GFLOPs. Consistent gains on MobileNet-V3, EfficientNet-B1, and Swin-B indicate strong generality and make MoCrop practical for real-time deployment in the compressed domain. Our code and models are available at https://github.com/microa/MoCrop.
Problem

Research questions and friction points this paper is trying to address.

Developing training-free motion-aware cropping for efficient video recognition
Utilizing H.264 motion vectors to locate motion-dense regions automatically
Reducing computational costs while maintaining or improving action recognition accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Motion vectors locate motion-dense regions for cropping
Training-free module with denoising, sampling, and adaptive cropping
Plug-and-play design reduces computation while maintaining accuracy
🔎 Similar Papers
No similar papers found.
B
Binhua Huang
School of Computer Science, University College Dublin
W
Wendong Yao
School of Computer Science, University College Dublin
S
Shaowu Chen
College of Electronics and Information Engineering, Shenzhen University
G
Guoxin Wang
School of Electrical and Electronic Engineering, University College Dublin
Q
Qingyuan Wang
School of Electrical and Electronic Engineering, University College Dublin
Soumyabrata Dev
Soumyabrata Dev
University College Dublin
environmental informaticsremote sensingrenewablemachine learning