🤖 AI Summary
This study addresses the lack of systematic categorization in existing explanations of Arabic grammatical errors, which predominantly rely on unstructured free text and hinder evaluation and practical application. To bridge this gap, the work proposes the first hierarchical classification framework specifically designed for Arabic grammatical error explanations, organized along four linguistic dimensions: orthography, morphology, syntax, and lexicon. The framework encompasses 27 error types, 140 correction categories, and 324 structured explanatory rules. Grounded in linguistic principles, the taxonomy was developed through expert annotation of existing corpora and validated via a novel automatic evaluation pipeline leveraging large language models. This approach enables, for the first time, the systematic organization and empirical validation of Arabic grammatical error explanations. The associated code and annotated data are publicly released.
📝 Abstract
We introduce ArabiGEE, the first comprehensive Arabic grammatical error explanation (GEE) taxonomy grounded in explicit error types. Unlike existing GEE approaches that treat explanation generation as free-form text, ArabiGEE organizes grammatical explanations through a hierarchical structure spanning orthographic, morphological, syntactic, and lexical dimensions. The taxonomy consists of 27 error types, 140 correction types, and 324 associated explanations. We apply ArabiGEE to manually annotate portions of existing Arabic grammatical error correction corpora and demonstrate how structured grammatical explanations can support automatic evaluation of LLMs on Arabic GEE. Our code and data are publicly available.