🤖 AI Summary
To address the misalignment between generic image compression and downstream task performance in machine vision, this paper proposes ROI-Packing: an end-to-end, task-aware, region-of-interest (ROI)-prioritized compression framework. It automatically identifies semantically salient ROIs, then applies adaptive quantization, optimized entropy coding, and lightweight boundary encoding to pack ROIs into compact bitstreams—without modifying or fine-tuning downstream detection/segmentation models. Evaluated on five benchmark datasets, ROI-Packing achieves up to 44.10% bitrate reduction over HEVC/VVC at zero task accuracy loss, or improves detection/segmentation mAP by up to 8.88% at equivalent bitrates. Its core innovation lies in the first unified modeling of task-driven ROI selection and lossless packing, thereby overcoming the longstanding bottleneck in joint optimization of compression and vision tasks.
📝 Abstract
This paper introduces ROI-Packing, an efficient image compression method tailored specifically for machine vision. By prioritizing regions of interest (ROI) critical to end-task accuracy and packing them efficiently while discarding less relevant data, ROI-Packing achieves significant compression efficiency without requiring retraining or fine-tuning of end-task models. Comprehensive evaluations across five datasets and two popular tasks-object detection and instance segmentation-demonstrate up to a 44.10% reduction in bitrate without compromising end-task accuracy, along with an 8.88 % improvement in accuracy at the same bitrate compared to the state-of-the-art Versatile Video Coding (VVC) codec standardized by the Moving Picture Experts Group (MPEG).