Teach-to-Reason with Scoring: Self-Explainable Rationale-Driven Multi-Trait Essay Scoring

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multi-trait automated essay scoring (AES) systems suffer from poor interpretability, undermining educators’ and learners’ trust in scoring outcomes. To address this, we propose the first “score-then-justify” joint generation paradigm: a lightweight student model sequentially generates trait-specific scores followed by corresponding justifications, thereby internalizing scoring decisions within the reasoning process and enabling co-optimization of scores and explanations. Our approach integrates large language model (LLM) distillation, multi-feature disentangled modeling, and supervised fine-tuning to enforce that each score selection logically supports a subsequent, verifiable justification. Evaluated on multi-trait AES benchmarks, our method achieves state-of-the-art accuracy while generating high-fidelity, score-aligned explanatory text. This significantly enhances system transparency and pedagogical utility—bridging the gap between automated assessment and actionable, trustworthy feedback.

Technology Category

Application Category

📝 Abstract
Multi-trait automated essay scoring (AES) systems provide a fine-grained evaluation of an essay's diverse aspects. While they excel in scoring, prior systems fail to explain why specific trait scores are assigned. This lack of transparency leaves instructors and learners unconvinced of the AES outputs, hindering their practical use. To address this, we propose a self-explainable Rationale-Driven Multi-trait automated Essay scoring (RaDME) framework. RaDME leverages the reasoning capabilities of large language models (LLMs) by distilling them into a smaller yet effective scorer. This more manageable student model is optimized to sequentially generate a trait score followed by the corresponding rationale, thereby inherently learning to select a more justifiable score by considering the subsequent rationale during training. Our findings indicate that while LLMs underperform in direct AES tasks, they excel in rationale generation when provided with precise numerical scores. Thus, RaDME integrates the superior reasoning capacities of LLMs into the robust scoring accuracy of an optimized smaller model. Extensive experiments demonstrate that RaDME achieves both accurate and adequate reasoning while supporting high-quality multi-trait scoring, significantly enhancing the transparency of AES.
Problem

Research questions and friction points this paper is trying to address.

Lack of transparency in multi-trait automated essay scoring systems.
Difficulty in justifying specific trait scores assigned by AES systems.
Need for integrating reasoning capabilities into AES for better explainability.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-explainable rationale-driven essay scoring framework
Leverages large language models for rationale generation
Optimized smaller model for accurate multi-trait scoring
🔎 Similar Papers
No similar papers found.
Heejin Do
Heejin Do
Postdoctoral Fellow, ETH Zurich, ETH AI Center
NLPAI in EducationEvaluationHuman-AI InteractionInterpretability
Sangwon Ryu
Sangwon Ryu
POSTECH
Natural Language ProcessingText SummarizationReinforcement LearningLarge Language Models
G
Gary Geunbae Lee
Graduate School of Artificial Intelligence, POSTECH, Republic of Korea; Department of Computer Science and Engineering, POSTECH, Republic of Korea