🤖 AI Summary
This study investigates the feasibility of single lightweight large language models (LLMs) with ≤9B parameters for multilingual grammatical error correction (GEC) across English, German, Italian, and Swedish. We systematically evaluate 17 prominent open-source models—including Llama, Gemma, Phi, and Qwen—using zero-shot and few-shot prompting, and assess performance via three complementary metrics: BLEU, ERRANT, and human evaluation, emphasizing both correction accuracy and edit minimization. To our knowledge, this is the first comparative study of cross-lingual GEC using a unified lightweight model architecture. Results show Gemma-9B consistently outperforms all others across all four languages: it achieves an average 4.2 percentage-point higher correction accuracy than the second-best model and reduces edit distance by 23%. We identify six models satisfying stringent cross-lingual performance thresholds, demonstrating that ≤9B-parameter LLMs can deliver high-quality, low-disturbance multilingual GEC. Gemma-9B emerges as the current state-of-the-art lightweight solution.
📝 Abstract
Recent language models can successfully solve various language-related tasks, and many understand inputs stated in different languages. In this paper, we explore the performance of 17 popular models used to correct grammatical issues in texts stated in English, German, Italian, and Swedish when using a single model to correct texts in all those languages. We analyze the outputs generated by these models, focusing on decreasing the number of grammatical errors while keeping the changes small. The conclusions drawn help us understand what problems occur among those models and which models can be recommended for multilingual grammatical error correction tasks. We list six models that improve grammatical correctness in all four languages and show that Gemma 9B is currently the best performing one for the languages considered.