GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Bangla handwritten text recognition (HTR) faces dual challenges: high script complexity—including conjuncts, diacritics, and diverse handwriting variants—and severe scarcity of annotated data. To address these, we propose an efficient Bangla-specific HTR framework that replaces conventional subword tokenizers (e.g., BPE or WordPiece) with a grapheme-level tokenizer, employs a decoder-only Transformer architecture, and incorporates phoneme-aware tokenization to better capture phoneme–grapheme correspondences. The model is pretrained on large-scale synthetic data and fine-tuned on real handwritten samples. Evaluated on BanglaLekha-Isolated, BMNIST, and our newly curated BHTR benchmark, it achieves state-of-the-art performance—reducing average character error rate by 12.6% and accelerating inference by 3.2× compared to subword-based baselines. Our approach thus delivers both superior accuracy and significantly improved computational efficiency.

Technology Category

Application Category

📝 Abstract

Despite Bengali being the sixth most spoken language in the world, handwritten text recognition (HTR) systems for Bengali remain severely underdeveloped. The complexity of Bengali script--featuring conjuncts, diacritics, and highly variable handwriting styles--combined with a scarcity of annotated datasets makes this task particularly challenging. We present GraDeT-HTR, a resource-efficient Bengali handwritten text recognition system based on a Grapheme-aware Decoder-only Transformer architecture. To address the unique challenges of Bengali script, we augment the performance of a decoder-only transformer by integrating a grapheme-based tokenizer and demonstrate that it significantly improves recognition accuracy compared to conventional subword tokenizers. Our model is pretrained on large-scale synthetic data and fine-tuned on real human-annotated samples, achieving state-of-the-art performance on multiple benchmark datasets.

Problem

Research questions and friction points this paper is trying to address.

Developing resource-efficient Bengali handwritten text recognition system

Addressing Bengali script complexity with conjuncts and diacritics

Overcoming scarcity of annotated Bengali handwriting datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Grapheme-based tokenizer for Bengali script

Decoder-only Transformer architecture for efficiency

Pretraining on synthetic data with real fine-tuning

🔎 Similar Papers

HATFormer: Historic Handwritten Arabic Text Recognition with Transformers