🤖 AI Summary
This work addresses the grammatical error correction (GEC) challenge for Zarma—a low-resource West African language—hampered by nonstandard orthography, scarce annotated data, and dialectal variation. We conduct the first systematic evaluation of rule-based systems, machine translation (MT) models, and small-scale multilingual LLMs for Zarma GEC. Our proposed multi-strategy framework integrates a rule engine, M2M100 (in zero-shot and fine-tuned settings), and mT5-small, trained on a novel benchmark comprising over 250,000 synthetically augmented and human-annotated samples—the first publicly available Zarma GEC dataset. Experimental results show that fine-tuned M2M100 achieves 95.82% error detection rate and 78.90% suggestion accuracy (human evaluation: 3.0/5.0), significantly outperforming both rule-based and LLM baselines; its successful cross-lingual transfer to Bambara further demonstrates generalizability. Key contributions include: (1) the first end-to-end Zarma GEC system; (2) the first methodology comparison for GEC in low-resource languages; (3) a reproducible synthetic data construction pipeline; and (4) empirical validation of cross-lingual transfer for African language GEC.
📝 Abstract
Grammatical error correction (GEC) aims to improve quality and readability of texts through accurate correction of linguistic mistakes. Previous work has focused on high-resource languages, while low-resource languages lack robust tools. However, low-resource languages often face problems such as: non-standard orthography, limited annotated corpora, and diverse dialects, which slows down the development of GEC tools. We present a study on GEC for Zarma, spoken by over five million in West Africa. We compare three approaches: rule-based methods, machine translation (MT) models, and large language models (LLMs). We evaluated them using a dataset of more than 250,000 examples, including synthetic and human-annotated data. Our results showed that the MT-based approach using M2M100 outperforms others, with a detection rate of 95. 82% and a suggestion accuracy of 78. 90% in automatic evaluations (AE) and an average score of 3.0 out of 5.0 in manual evaluation (ME) from native speakers for grammar and logical corrections. The rule-based method was effective for spelling errors but failed on complex context-level errors. LLMs -- MT5-small -- showed moderate performance. Our work supports use of MT models to enhance GEC in low-resource settings, and we validated these results with Bambara, another West African language.