🤖 AI Summary
The absence of a unified evaluation benchmark for video compression optimized for downstream vision tasks (e.g., detection, recognition) hinders standardization of Feature Coding for Machines (FCM). Method: This paper introduces the first open-source evaluation platform supporting joint optimization across multiple dimensions: vision tasks (image/video understanding), model architectures (CNNs/Transformers), and codecs (traditional and deep learning-based). It proposes a novel rate-distortion–task-accuracy joint analysis framework, integrating standardized codecs with differentiable neural compression modules to enable end-to-end co-evaluation of compression efficiency and downstream task performance. Contribution/Results: Extensive validation on multiple benchmark datasets demonstrates significant improvements in task-driven compression efficiency. The platform has been formally adopted by MPEG as the core infrastructure for FCM standard development and evaluation.
📝 Abstract
With the increasing use of neural network (NN)-based computer vision applications that process image and video data as input, interest has emerged in video compression technology optimized for computer vision tasks. In fact, given the variety of vision tasks, associated NN models and datasets, a consolidated platform is needed as a common ground to implement and evaluate compression methods optimized for downstream vision tasks. CompressAI-Vision is introduced as a comprehensive evaluation platform where new coding tools compete to efficiently compress the input of vision network while retaining task accuracy in the context of two different inference scenarios: "remote" and "split" inferencing. Our study showcases various use cases of the evaluation platform incorporated with standard codecs (under development) by examining the compression gain on several datasets in terms of bit-rate versus task accuracy. This evaluation platform has been developed as open-source software and is adopted by the Moving Pictures Experts Group (MPEG) for the development the Feature Coding for Machines (FCM) standard. The software is available publicly at https://github.com/InterDigitalInc/CompressAI-Vision.