Calibrating Generative AI to Produce Realistic Essays for Data Augmentation

📅 2026-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the performance bottleneck in machine learning–based automated essay scoring systems caused by limited training data by proposing the use of large language models to generate synthetic student essays for data augmentation. The work presents the first empirical evaluation of three prompting strategies—“next-sentence prediction,” “sentence-level prompting,” and “25-shot exemplars”—systematically comparing their effectiveness in generating text that preserves original essay quality and exhibits human-like authenticity. Results indicate that the next-sentence prediction strategy achieves the highest scoring consistency and, alongside sentence-level prompting, best retains the quality of the source essays. Moreover, texts generated via next-sentence prediction and the 25-shot approach demonstrate the greatest authenticity. This research provides both effective strategies and empirical evidence supporting the use of synthetic data augmentation in automated essay scoring.

Technology Category

Application Category

📝 Abstract
Data augmentation can mitigate limited training data in machine-learning automated scoring engines for constructed response items. This study seeks to determine how well three approaches to large language model prompting produce essays that preserve the writing quality of the original essays and produce realistic text for augmenting ASE training datasets. We created simulated versions of student essays, and human raters assigned scores to them and rated the realism of the generated text. The results of the study indicate that the predict next prompting strategy produces the highest level of agreement between human raters regarding simulated essay scores, predict next and sentence strategies best preserve the rated quality of the original essay in the simulated essays, and predict next and 25 examples strategies produce the most realistic text as judged by human raters.
Problem

Research questions and friction points this paper is trying to address.

data augmentation
generative AI
essay realism
automated scoring
writing quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

data augmentation
large language models
prompting strategies
automated scoring
realistic text generation
🔎 Similar Papers
No similar papers found.
Edward W. Wolfe
Edward W. Wolfe
Pearson
Psychometrics
J
Justin O. Barber
Pearson