🤖 AI Summary
AI-generated and human-written texts exhibit substantial feature overlap, causing declining accuracy and poor interpretability in existing detection methods. Method: We propose a zero-shot detection framework inspired by DNA mutation repair: treating AI-generated text as a sequence containing “mutations,” we iteratively repair tokens under language model probability guidance, accumulating repair cost as an interpretable discriminative signal—requiring no fine-tuning or training data. Contribution/Results: Our approach directly quantifies generation divergence and achieves state-of-the-art performance across multiple benchmarks, improving AUROC by 5.55% and F1 by 2.08%. It demonstrates strong robustness against adversarial attacks, cross-model generalization, and varying text lengths. The core innovation lies in modeling biological repair mechanisms as an interpretable sequence optimization process, establishing a novel paradigm for AI-text detection in high-overlap regimes.
📝 Abstract
The rapid advancement of large language models (LLMs) has blurred the line between AI-generated and human-written text. This progress brings societal risks such as misinformation, authorship ambiguity, and intellectual property concerns, highlighting the urgent need for reliable AI-generated text detection methods. However, recent advances in generative language modeling have resulted in significant overlap between the feature distributions of human-written and AI-generated text, blurring classification boundaries and making accurate detection increasingly challenging. To address the above challenges, we propose a DNA-inspired perspective, leveraging a repair-based process to directly and interpretably capture the intrinsic differences between human-written and AI-generated text. Building on this perspective, we introduce DNA-DetectLLM, a zero-shot detection method for distinguishing AI-generated and human-written text. The method constructs an ideal AI-generated sequence for each input, iteratively repairs non-optimal tokens, and quantifies the cumulative repair effort as an interpretable detection signal. Empirical evaluations demonstrate that our method achieves state-of-the-art detection performance and exhibits strong robustness against various adversarial attacks and input lengths. Specifically, DNA-DetectLLM achieves relative improvements of 5.55% in AUROC and 2.08% in F1 score across multiple public benchmark datasets.