Enhancing Essay Cohesion Assessment: A Novel Item Response Theory Approach

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited accuracy of automated text cohesion assessment in educational writing. We propose a novel scoring calibration method that integrates Item Response Theory (IRT) with machine learning—departing from conventional models that ignore examinee-level variability. To our knowledge, this is the first application of IRT to automated cohesion scoring in essay evaluation, enabling joint modeling of rater (i.e., model) ability, item (i.e., prompt or criterion) difficulty, and discrimination. Using a 325-dimensional linguistic feature set, we construct a regression-based ML framework on a large-scale corpus of Portuguese student essays and apply IRT calibration. Experimental results demonstrate statistically significant improvements over baseline ML and ensemble models across multiple evaluation metrics, enhancing both scoring accuracy and psychometric validity. The approach advances automated writing assessment in educational AI by offering an interpretable, examinee-sensitive paradigm grounded in measurement theory.

Technology Category

Application Category

📝 Abstract
Essays are considered a valuable mechanism for evaluating learning outcomes in writing. Textual cohesion is an essential characteristic of a text, as it facilitates the establishment of meaning between its parts. Automatically scoring cohesion in essays presents a challenge in the field of educational artificial intelligence. The machine learning algorithms used to evaluate texts generally do not consider the individual characteristics of the instances that comprise the analysed corpus. In this meaning, item response theory can be adapted to the context of machine learning, characterising the ability, difficulty and discrimination of the models used. This work proposes and analyses the performance of a cohesion score prediction approach based on item response theory to adjust the scores generated by machine learning models. In this study, the corpus selected for the experiments consisted of the extended Essay-BR, which includes 6,563 essays in the style of the National High School Exam (ENEM), and the Brazilian Portuguese Narrative Essays, comprising 1,235 essays written by 5th to 9th grade students from public schools. We extracted 325 linguistic features and treated the problem as a machine learning regression task. The experimental results indicate that the proposed approach outperforms conventional machine learning models and ensemble methods in several evaluation metrics. This research explores a potential approach for improving the automatic evaluation of cohesion in educational essays.
Problem

Research questions and friction points this paper is trying to address.

Automatically scoring cohesion in essays using AI
Adapting item response theory for machine learning models
Improving cohesion assessment in educational essays
Innovation

Methods, ideas, or system contributions that make the work stand out.

Item response theory enhances machine learning models
Linguistic features extracted for cohesion assessment
Outperforms conventional models in evaluation metrics
🔎 Similar Papers
No similar papers found.