Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore

πŸ“… 2024-05-07
πŸ›οΈ International Conference on Computational Linguistics
πŸ“ˆ Citations: 3
✨ Influential: 0
πŸ“„ PDF

career value

166K/year
πŸ€– AI Summary
This paper addresses the challenge of detecting LLM-generated text in the absence of large-scale labeled data. We propose GECScore, a training-free, source-model-agnostic black-box zero-shot detection method. Its core insight is that human-written text exhibits systematically higher grammatical error rates than LLM-generated text. GECScore leverages pre-trained grammatical error correction (GEC) models (e.g., BART or PIE) to compute the probability difference between original and corrected textβ€”requiring no fine-tuning or supervision. Lightweight, model-agnostic, and robust to paraphrasing attacks, GECScore achieves an average AUROC of 98.62% on XSum and Writing Prompts benchmarks, outperforming existing zero-shot and supervised methods. It further demonstrates strong generalization in real-world settings and under adversarial paraphrasing.

Technology Category

Application Category

πŸ“ Abstract
The efficacy of detectors for texts generated by large language models (LLMs) substantially depends on the availability of large-scale training data. However, white-box zero-shot detectors, which require no such data, are limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose a simple yet effective black-box zero-shot detection approach based on the observation that, from the perspective of LLMs, human-written texts typically contain more grammatical errors than LLM-generated texts. This approach involves calculating the Grammar Error Correction Score (GECScore) for the given text to differentiate between human-written and LLM-generated text. Experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods, achieving an average AUROC of 98.62% across XSum and Writing Prompts dataset. Additionally, our approach demonstrates strong reliability in the wild, exhibiting robust generalization and resistance to paraphrasing attacks. Data and code are available at: https://github.com/NLP2CT/GECScore.
Problem

Research questions and friction points this paper is trying to address.

Detecting LLM-generated text without training data.
Differentiating human-written and LLM-generated text using GECScore.
Improving zero-shot detection accuracy and robustness.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box zero-shot detection using GECScore
Differentiates human-written and LLM-generated text
Outperforms SOTA methods with 98.62% AUROC
πŸ”Ž Similar Papers