🤖 AI Summary
This study addresses the lack of interpretability and transparency in subtrait scoring models within automated writing evaluation (AWE), where “black-box” decisions hinder educators’ and students’ understanding of fine-grained scoring rationales. To bridge this gap, we propose the first application of generative language models (GLMs) for interpretable subtrait-level modeling: GLMs generate natural-language explanations for subtrait scores and quantify alignment with human judgments. Correlation analyses demonstrate moderate agreement between GLM-predicted subtrait scores and human ratings (r ≈ 0.4–0.6), while all subtraits exhibit statistically significant correlations with holistic scores (p < 0.01). Our approach enhances AWE interpretability by grounding explanations in linguistically coherent, human-aligned reasoning, and provides actionable, fine-grained diagnostic feedback to support human-AI collaborative assessment.
📝 Abstract
Subtrait (latent-trait components) assessment presents a promising path toward enhancing transparency of automated writing scores. We prototype explainability and subtrait scoring with generative language models and show modest correlation between human subtrait and trait scores, and between automated and human subtrait scores. Our approach provides details to demystify scores for educators and students.