Automated Scoring of Arabic Text Using Large Language Models: A Literature Review

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the absence of a systematic review in automated scoring of Arabic texts—spanning short answers and essays—which has hindered both research progress and practical deployment. To bridge this gap, the work proposes the first five-dimensional structured taxonomy for the field, encompassing application domains, feedback generation capabilities, large language model architectures, alignment with competency frameworks, and prompt engineering strategies. Through an integrated synthesis of literature, large language model methodologies, educational assessment metrics, and Arabic-specific datasets, the study establishes a unified analytical framework. It further identifies critical shortcomings in current approaches concerning methodology, data availability, and evaluation standards, advocating for more pedagogically informed research to enhance the quality and practical utility of automated scoring systems for Arabic.

📝 Abstract

In modern educational systems, Automatic Text Scoring (ATS) plays a central role by enabling scalable and consistent evaluation of learner responses without human intervention. Recently, the increased accessibility of LLMs and Arabic-specific datasets has sparked renewed interest in this area. In this work, we investigate LLM-Based approaches for the automated evaluation of Arabic texts, focusing on both short answer grading (ASAG) and essay scoring (AES). We further introduce a structured taxonomy comprising five dimensions: application domain, feedback generation capability, LLM architecture deployed, alignment with competency referential frameworks, and prompt engineering strategy. By applying this taxonomy, we conduct a comparative analysis of existing studies, examining their methodological approaches, datasets, evaluation metrics, and reported performance. The findings highlight the need for sustained and pedagogically grounded research efforts in Arabic ATS, given its significance for improving educational quality across Arabic-speaking communities.

Problem

Research questions and friction points this paper is trying to address.

Automatic Text Scoring

Arabic language

Short answer grading

Essay scoring

Educational assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic Text Scoring

Large Language Models

Arabic NLP