Automated Essay Scoring and Language Certification: Assessing Generalizability, Agreement and Validity for French

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

152K/year
🤖 AI Summary
This study addresses the lack of multidimensional, systematic validity evidence for automated essay scoring (AES) systems in high-stakes French-language assessments. The authors propose an enhanced argument-based validation framework that integrates fairness analysis, linguistic feature interpretability, prediction error diagnostics, and agreement evaluation against multiple human raters. Leveraging a dataset of 27,000 doubly scored essays and 961 generalization samples annotated by at least nine raters each, the study conducts a comprehensive evaluation of eight state-of-the-art model architectures. The findings substantially advance understanding of the capabilities and limitations of AES models in French, demonstrate the practical utility and robustness of the proposed framework, and establish a generalizable paradigm for validating AES systems in high-stakes testing contexts.
📝 Abstract
In Automated Essay Scoring (AES), benchmarking practices have fostered minimalist evaluation practices, in contrast with the broader-view recommendations of evaluation frameworks, such as the argument-based validation framework (ABV), which argued in favor of a multidimensional assessment of systems, especially in the context of high-stakes language tests. In this paper, we introduce an enhanced and more practical version of the ABV framework, incorporating fairness analysis, correlations with linguistic features, prediction error evaluation, and model agreement compared with human raters. Applying this framework to French AES, we compare 8 model architectures on a corpus of 27k exam essays (2 raters each) and a generalization corpus of 961 essays (at least nine raters each). Our analyses illustrate the benefits of applying the ABV framework to better understand the capabilities and pitfalls of AES models, while also advancing the state-of-the-art for French AES.
Problem

Research questions and friction points this paper is trying to address.

Automated Essay Scoring
argument-based validation
generalizability
rater agreement
validity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated Essay Scoring
Argument-Based Validation
Fairness Analysis
Model Generalizability
Human-Model Agreement
🔎 Similar Papers
No similar papers found.