🤖 AI Summary
To address the challenge of few-shot domain adaptation in handwriting recognition, this paper investigates efficient domain transfer for Connectionist Temporal Classification (CTC) models under extremely limited target-domain data—down to just 16 lines of text. We propose a lightweight adaptation paradigm relying solely on supervised fine-tuning and image- or sequence-level data augmentation, deliberately avoiding complex domain-adaptation techniques. Evaluated under both writer-dependent and writer-independent protocols on large-scale real-world datasets, our method substantially mitigates overfitting: it achieves relative reductions in character error rate (CER) of 25% (with 16 lines) to 50% (with 256 lines) when adapting to unseen writers. Our core contribution is demonstrating that, within the CTC framework, carefully designed fine-tuning combined with simple yet effective data augmentation suffices to attain strong generalization—challenging the necessity of sophisticated domain-adaptation methods in low-resource handwriting recognition.
📝 Abstract
In many machine learning tasks, a large general dataset and a small specialized dataset are available. In such situations, various domain adaptation methods can be used to adapt a general model to the target dataset. We show that in the case of neural networks trained for handwriting recognition using CTC, simple fine-tuning with data augmentation works surprisingly well in such scenarios and that it is resistant to overfitting even for very small target domain datasets. We evaluated the behavior of fine-tuning with respect to augmentation, training data size, and quality of the pre-trained network, both in writer-dependent and writer-independent settings. On a large real-world dataset, fine-tuning on new writers provided an average relative CER improvement of 25 % for 16 text lines and 50 % for 256 text lines.