🤖 AI Summary
This work proposes a lightweight compressed-domain tracking model that addresses the high computational cost and limited real-time performance of conventional object tracking methods in large-scale video surveillance, which typically require full decoding of RGB video streams. By directly leveraging motion vectors and transform coefficients from compressed bitstreams without decoding the original video, the proposed approach enables cross-frame propagation of object bounding boxes for the first time. Built upon codec-domain modeling, the method substantially reduces computational overhead while maintaining high accuracy—achieving only a 4% drop in mAP@0.5 compared to an RGB-based baseline on the MOTS15/17/20 benchmarks, with up to a 3.7× speedup in inference throughput.
📝 Abstract
We propose a lightweight compressed-domain tracking model that operates directly on video streams, without requiring full RGB video decoding. Using motion vectors and transform coefficients from compressed data, our deep model propagates object bounding boxes across frames, achieving a computational speed-up of order up to 3.7 with only a slight 4% mAP@0.5 drop vs RGB baseline on MOTS15/17/20 datasets. These results highlight codec-domain motion modeling efficiency for real-time analytics in large monitoring systems.