🤖 AI Summary
Emerging machine vision applications require efficient transmission of neural network intermediate features—rather than pixel data—necessitating a paradigm shift from human-vision-oriented video coding.
Method: This paper pioneers the first systematic study of VVC-based feature compression optimized for machine perception, introducing three lightweight coding profiles—Fast, Faster, and Fastest—designed via fine-grained analysis of VVC tool impacts on downstream task accuracy to jointly optimize coding efficiency and inference fidelity.
Contribution/Results: Fast achieves a 21.8% reduction in encoding time while improving BD-Rate by 2.96%; Fastest delivers 95.6% encoding acceleration with only a 1.71% BD-Rate degradation. The framework constitutes the first deployable, VVC-based, machine-perception-optimized coding solution proposed to MPEG for standardization under the AI Feature Coding for Machines (FCM) initiative.
📝 Abstract
Modern video codecs have been extensively optimized to preserve perceptual quality, leveraging models of the human visual system. However, in split inference systems-where intermediate features from neural network are transmitted instead of pixel data-these assumptions no longer apply. Intermediate features are abstract, sparse, and task-specific, making perceptual fidelity irrelevant. In this paper, we investigate the use of Versatile Video Coding (VVC) for compressing such features under the MPEG-AI Feature Coding for Machines (FCM) standard. We perform a tool-level analysis to understand the impact of individual coding components on compression efficiency and downstream vision task accuracy. Based on these insights, we propose three lightweight essential VVC profiles-Fast, Faster, and Fastest. The Fast profile provides 2.96% BD-Rate gain while reducing encoding time by 21.8%. Faster achieves a 1.85% BD-Rate gain with a 51.5% speedup. Fastest reduces encoding time by 95.6% with only a 1.71% loss in BD-Rate.