Detecting AI-Generated Essays in Writing Assessment: Responsible Use and Generalizability Across LLMs

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the threat posed by high-quality AI-generated essays to the authenticity of writing assessment and the limited generalization capability of existing detectors in cross–large language model (LLM) scenarios. It presents the first systematic evaluation of mainstream AI text detectors on essays generated by multiple LLMs, constructing a multi-source dataset based on publicly available GRE prompts to conduct an empirical analysis of cross-model generalization. The findings reveal a significant performance drop in current detectors when applied across different LLMs. Building on these insights, the work proposes actionable retraining strategies and responsible deployment guidelines to enhance the robustness and practical utility of detection tools in real-world educational settings.

Technology Category

Application Category

📝 Abstract

Writing is a foundational literacy skill that underpins effective communication, fosters critical thinking, facilitates learning across disciplines, and enables individuals to organize and articulate complex ideas. Consequently, writing assessment plays a vital role in evaluating language proficiency, communicative effectiveness, and analytical reasoning. The rapid advancement of large language models (LLMs) has made it increasingly easy to generate coherent, high-quality essays, raising significant concerns about the authenticity of student-submitted work. This chapter first provides an overview of the current landscape of detectors for AI-generated and AI-assisted essays, along with guidelines for their responsible use. It then presents empirical analyses to evaluate how well detectors trained on essays from one LLM generalize to identifying essays produced by other LLMs, based on essays generated in response to public GRE writing prompts. These findings provide guidance for developing and retraining detectors for practical applications.

Problem

Research questions and friction points this paper is trying to address.

AI-generated essays

writing assessment

large language models

detector generalizability

academic integrity

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-generated text detection

large language models

generalizability

writing assessment

cross-model evaluation

🔎 Similar Papers

No similar papers found.

Authors to Follow