🤖 AI Summary
This study addresses the absence of a systematic review in automated scoring of Arabic texts—spanning short answers and essays—which has hindered both research progress and practical deployment. To bridge this gap, the work proposes the first five-dimensional structured taxonomy for the field, encompassing application domains, feedback generation capabilities, large language model architectures, alignment with competency frameworks, and prompt engineering strategies. Through an integrated synthesis of literature, large language model methodologies, educational assessment metrics, and Arabic-specific datasets, the study establishes a unified analytical framework. It further identifies critical shortcomings in current approaches concerning methodology, data availability, and evaluation standards, advocating for more pedagogically informed research to enhance the quality and practical utility of automated scoring systems for Arabic.
📝 Abstract
In modern educational systems, Automatic Text Scoring (ATS) plays a central role by enabling scalable and consistent evaluation of learner responses without human intervention. Recently, the increased accessibility of LLMs and Arabic-specific datasets has sparked renewed interest in this area. In this work, we investigate LLM-Based approaches for the automated evaluation of Arabic texts, focusing on both short answer grading (ASAG) and essay scoring (AES). We further introduce a structured taxonomy comprising five dimensions: application domain, feedback generation capability, LLM architecture deployed, alignment with competency referential frameworks, and prompt engineering strategy. By applying this taxonomy, we conduct a comparative analysis of existing studies, examining their methodological approaches, datasets, evaluation metrics, and reported performance. The findings highlight the need for sustained and pedagogically grounded research efforts in Arabic ATS, given its significance for improving educational quality across Arabic-speaking communities.