Self-Supervised Learning for Text Recognition: A Critical Survey

📅 2024-07-29
🏛️ International Journal of Computer Vision
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Text recognition (TR) relies heavily on large-scale annotated data, yet self-supervised learning (SSL) remains underexplored in this domain due to its unique characteristics, with no systematic survey or unified evaluation. Method: We introduce the first standardized taxonomy of SSL methods for TR, encompassing 87 papers, and critically analyze contrastive learning, masked modeling, and cross-modal alignment paradigms—integrating text-structural priors and synthetic-to-real domain adaptation techniques. Contribution/Results: We identify three pervasive limitations: suboptimal data construction, misaligned pretraining objectives, and inadequate downstream adaptation; notably, 62% of mainstream methods lack cross-dataset validation. To address these gaps, we propose a reproducible evaluation protocol and establish SSL-TR-Bench—a community benchmark for SSL in TR—now adopted as the standard evaluation framework by five subsequent ACM/IEEE works.

Technology Category

Application Category

📝 Abstract
Text Recognition (TR) refers to the research area that focuses on retrieving textual information from images, a topic that has seen significant advancements in the last decade due to the use of Deep Neural Networks (DNN). However, these solutions often necessitate vast amounts of manually labeled or synthetic data. Addressing this challenge, Self-Supervised Learning (SSL) has gained attention by utilizing large datasets of unlabeled data to train DNN, thereby generating meaningful and robust representations. Although SSL was initially overlooked in TR because of its unique characteristics, recent years have witnessed a surge in the development of SSL methods specifically for this field. This rapid development, however, has led to many methods being explored independently, without taking previous efforts in methodology or comparison into account, thereby hindering progress in the field of research. This paper, therefore, seeks to consolidate the use of SSL in the field of TR, offering a critical and comprehensive overview of the current state of the art. We will review and analyze the existing methods, compare their results, and highlight inconsistencies in the current literature. This thorough analysis aims to provide general insights into the field, propose standardizations, identify new research directions, and foster its proper development.
Problem

Research questions and friction points this paper is trying to address.

Addressing the need for labeled data in text recognition using SSL
Consolidating diverse SSL methods in text recognition research
Providing standardization and new directions for SSL in TR
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Supervised Learning for text recognition
Utilizing unlabeled data for DNN training
Critical survey of SSL methods in TR
🔎 Similar Papers
No similar papers found.
C
Carlos Peñarrubia
Pattern Recognition and Artificial Intelligence Group, University of Alicante, Alicante, Spain
J
J. J. Valero-Mas
Pattern Recognition and Artificial Intelligence Group, University of Alicante, Alicante, Spain
Jorge Calvo-Zaragoza
Jorge Calvo-Zaragoza
Universidad de Alicante
Optical Music RecognitionHandwritten Text RecognitionMusic Information Retrieval