TFIC: End-to-End Text-Focused Image Compression for Coding for Machines

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the fundamental limitation of conventional image compression—its human-vision-centric design, which often degrades performance on downstream machine vision tasks such as OCR. We propose the first end-to-end text-aware image compression framework. Methodologically, OCR task objectives are deeply integrated into the entire codec pipeline: rate-distortion optimization is jointly performed with a text-aware distortion metric, augmented by OCR feature distillation and a text-structure-aware attention mechanism. Our contributions are threefold: (1) At ultra-low bitrates (0.1 bpp), OCR accuracy on compressed images surpasses that on original uncompressed images—a first in the literature; (2) compared to JPEG+OCR, our method improves OCR accuracy by over 22%; and (3) it achieves efficient compression—encoding latency is only half that of the OCR module—and inherently performs semantic preprocessing, making it highly suitable for resource-constrained devices.

Technology Category

Application Category

📝 Abstract

Traditional image compression methods aim to faithfully reconstruct images for human perception. In contrast, Coding for Machines focuses on compressing images to preserve information relevant to a specific machine task. In this paper, we present an image compression system designed to retain text-specific features for subsequent Optical Character Recognition (OCR). Our encoding process requires half the time needed by the OCR module, making it especially suitable for devices with limited computational capacity. In scenarios where on-device OCR is computationally prohibitive, images are compressed and later processed to recover the text content. Experimental results demonstrate that our method achieves significant improvements in text extraction accuracy at low bitrates, even improving over the accuracy of OCR performed on uncompressed images, thus acting as a local pre-processing step.

Problem

Research questions and friction points this paper is trying to address.

Develop text-focused image compression for OCR

Optimize compression for low computational devices

Improve text extraction accuracy at low bitrates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-focused image compression for OCR

Half encoding time of OCR module

Improves text extraction at low bitrates

🔎 Similar Papers

No similar papers found.