GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Bangla handwritten text recognition (HTR) faces dual challenges: high script complexity—including conjuncts, diacritics, and diverse handwriting variants—and severe scarcity of annotated data. To address these, we propose an efficient Bangla-specific HTR framework that replaces conventional subword tokenizers (e.g., BPE or WordPiece) with a grapheme-level tokenizer, employs a decoder-only Transformer architecture, and incorporates phoneme-aware tokenization to better capture phoneme–grapheme correspondences. The model is pretrained on large-scale synthetic data and fine-tuned on real handwritten samples. Evaluated on BanglaLekha-Isolated, BMNIST, and our newly curated BHTR benchmark, it achieves state-of-the-art performance—reducing average character error rate by 12.6% and accelerating inference by 3.2× compared to subword-based baselines. Our approach thus delivers both superior accuracy and significantly improved computational efficiency.

Technology Category

Application Category

📝 Abstract
Despite Bengali being the sixth most spoken language in the world, handwritten text recognition (HTR) systems for Bengali remain severely underdeveloped. The complexity of Bengali script--featuring conjuncts, diacritics, and highly variable handwriting styles--combined with a scarcity of annotated datasets makes this task particularly challenging. We present GraDeT-HTR, a resource-efficient Bengali handwritten text recognition system based on a Grapheme-aware Decoder-only Transformer architecture. To address the unique challenges of Bengali script, we augment the performance of a decoder-only transformer by integrating a grapheme-based tokenizer and demonstrate that it significantly improves recognition accuracy compared to conventional subword tokenizers. Our model is pretrained on large-scale synthetic data and fine-tuned on real human-annotated samples, achieving state-of-the-art performance on multiple benchmark datasets.
Problem

Research questions and friction points this paper is trying to address.

Developing resource-efficient Bengali handwritten text recognition system
Addressing Bengali script complexity with conjuncts and diacritics
Overcoming scarcity of annotated Bengali handwriting datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Grapheme-based tokenizer for Bengali script
Decoder-only Transformer architecture for efficiency
Pretraining on synthetic data with real fine-tuning
🔎 Similar Papers
M
Md. Mahmudul Hasan
Computer Science and Engineering, University of Dhaka
A
Ahmed Nesar Tahsin Choudhury
Computer Science and Engineering, University of Dhaka
M
Mahmudul Hasan
Computer Science and Engineering, University of Dhaka
Md. Mosaddek Khan
Md. Mosaddek Khan
Associate Professor, Dept. of Computer Science and Engineering, University of Dhaka
Multi-Agent SystemsDeep Neural NetworksMachine Learning TheoryArtificial Intelligence