Practical Fine-Tuning of Autoregressive Models on Limited Handwritten Texts

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address the high annotation cost and poor model adaptability in few-shot handwritten OCR, this paper proposes a progressive autoregressive fine-tuning framework requiring only 16–256 lines of user-corrected text. Methodologically, it introduces a confidence-driven informative line selection strategy and a reliable early-stopping criterion, revealing for the first time the distinct roles of encoder and decoder during few-shot fine-tuning. Built upon an end-to-end Transformer-based OCR architecture, the approach integrates progressive supervised fine-tuning with global error trend analysis. Experiments demonstrate that just 16 corrected lines yield a 10% relative reduction in character error rate (CER), rising to 40% with 256 lines; moreover, halving the annotation budget incurs no performance degradation. The method significantly enhances recognition robustness and practical deployability in real-world low-resource scenarios.

Technology Category

Application Category

📝 Abstract

A common use case for OCR applications involves users uploading documents and progressively correcting automatic recognition to obtain the final transcript. This correction phase presents an opportunity for progressive adaptation of the OCR model, making it crucial to adapt early, while ensuring stability and reliability. We demonstrate that state-of-the-art transformer-based models can effectively support this adaptation, gradually reducing the annotator's workload. Our results show that fine-tuning can reliably start with just 16 lines, yielding a 10% relative improvement in CER, and scale up to 40% with 256 lines. We further investigate the impact of model components, clarifying the roles of the encoder and decoder in the fine-tuning process. To guide adaptation, we propose reliable stopping criteria, considering both direct approaches and global trend analysis. Additionally, we show that OCR models can be leveraged to cut annotation costs by half through confidence-based selection of informative lines, achieving the same performance with fewer annotations.

Problem

Research questions and friction points this paper is trying to address.

Adapting OCR models with limited handwritten text data

Reducing annotator workload via progressive fine-tuning

Cutting annotation costs using confidence-based line selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning transformers with minimal handwritten data

Encoder-decoder role analysis in adaptation process

Confidence-based line selection halves annotation costs

🔎 Similar Papers

Spatial Context-based Self-Supervised Learning for Handwritten Text Recognition