🤖 AI Summary
This study addresses the lack of support for the low-resource classical language Prakrit in mainstream multilingual models such as IndicTrans2. The authors propose a language tag mapping strategy that requires no modifications to the tokenizer, vocabulary, or model architecture, instead mapping Maharashtri Prakrit to the Hindi language tag (hin_Deva) to leverage multilingual transfer learning for English-to-Prakrit translation. After fine-tuning on a parallel corpus of 1,474 sentence pairs, the model achieves significantly higher BLEU scores than the unadapted baseline on a 20-sentence Ardhamāgadhī test set. This work provides the first empirical validation that script-compatible language tag routing can effectively extend modern multilingual models to unsupported classical languages, demonstrating feasibility despite severe data scarcity and dialectal divergence.
📝 Abstract
We study English-to-Prakrit machine translation in a low-resource setting where the target language is unsupported by IndicTrans2. We adapt the multilingual model by mapping Prakrit to the Hindi language tag (hin_Deva) without modifying the tokenizer, vocabulary, or architecture. Using a 1,474-pair Maharashtri Prakrit parallel corpus and evaluation on a 20-sample Ardhamagadhi test set, we report corpus BLEU improvements over an untuned baseline. The results indicate that script-compatible language routing can enable feasible transfer to unsupported classical languages, while highlighting limitations due to data scarcity and dialect mismatch. Our code and trained models are released to the public for further exploration https://github.com/D3v1s0m/indictrans2-prakrit-mt.