Exploration of Summarization by Generative Language Models for Automated Scoring of Long Essays

📅 2025-10-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Encoder-based models (e.g., BERT) face a critical limitation in automated essay scoring (AES) due to their 512-token input constraint, resulting in inadequate comprehension and inconsistent scoring for long essays. To address this, we propose a novel LLM-based scoring framework that integrates text summarization with structured prompt engineering within a two-stage “summarize-then-score” paradigm. This design enables effective processing of lengthy inputs while generating interpretable, rationale-backed scores. Evaluated on the Learning Agency Lab AES 2.0 benchmark, our approach achieves a quadratic weighted kappa (QWK) of 0.8878—surpassing the BERT baseline (0.822) by 6.6 percentage points. The improvement demonstrates both scalability beyond fixed-length encoders and enhanced inter-rater consistency. Our work establishes a new, extensible pathway for high-fidelity AES of long-form compositions.

Technology Category

Application Category

📝 Abstract

BERT and its variants are extensively explored for automated scoring. However, a limit of 512 tokens for these encoder-based models showed the deficiency in automated scoring of long essays. Thus, this research explores generative language models for automated scoring of long essays via summarization and prompting. The results revealed great improvement of scoring accuracy with QWK increased from 0.822 to 0.8878 for the Learning Agency Lab Automated Essay Scoring 2.0 dataset.

Problem

Research questions and friction points this paper is trying to address.

Automated scoring of long essays using generative language models

Overcoming token limitations in encoder-based scoring models

Improving scoring accuracy through summarization and prompting techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using generative models for essay scoring

Summarizing long essays via prompting

Improving scoring accuracy with QWK

🔎 Similar Papers

No similar papers found.

Authors to Follow