🤖 AI Summary
Current AI text detectors exhibit unknown robustness against semantically preserving paraphrasing—particularly for outputs generated by advanced open-weight LLMs like DeepSeek.
Method: We systematically evaluate six mainstream detectors (AI Text Classifier, Content Detector AI, Copyleaks, QuillBot, GPT-2 Detector, GPTZero) on DeepSeek-generated text and assess their resilience under standard paraphrasing and humanization-based adversarial attacks. Additionally, we propose and validate a novel in-model detection paradigm leveraging few-shot prompting and chain-of-thought reasoning within DeepSeek itself.
Contribution/Results: QuillBot and Copyleaks achieve highest baseline accuracy but suffer substantial performance degradation under humanized rewriting. In contrast, our DeepSeek-based detector attains 96.2% recall—approaching state-of-the-art external tools—demonstrating that LLMs can serve as effective, self-contained detectors. This work reveals critical vulnerabilities of existing detectors to semantic-preserving transformations and establishes large language models’ intrinsic capability for high-fidelity AI text detection.
📝 Abstract
Large language models (LLMs) have rapidly transformed the creation of written materials. LLMs have led to questions about writing integrity, thereby driving the creation of artificial intelligence (AI) detection technologies. Adversarial attacks, such as standard and humanized paraphrasing, inhibit detectors' ability to detect machine-generated text. Previous studies have mainly focused on ChatGPT and other well-known LLMs and have shown varying accuracy across detectors. However, there is a clear gap in the literature about DeepSeek, a recently published LLM. Therefore, in this work, we investigate whether six generally accessible AI detection tools -- AI Text Classifier, Content Detector AI, Copyleaks, QuillBot, GPT-2, and GPTZero -- can consistently recognize text generated by DeepSeek. The detectors were exposed to the aforementioned adversarial attacks. We also considered DeepSeek as a detector by performing few-shot prompting and chain-of-thought reasoning (CoT) for classifying AI and human-written text. We collected 49 human-authored question-answer pairs from before the LLM era and generated matching responses using DeepSeek-v3, producing 49 AI-generated samples. Then, we applied adversarial techniques such as paraphrasing and humanizing to add 196 more samples. These were used to challenge detector robustness and assess accuracy impact. While QuillBot and Copyleaks showed near-perfect performance on original and paraphrased DeepSeek text, others -- particularly AI Text Classifier and GPT-2 -- showed inconsistent results. The most effective attack was humanization, reducing accuracy to 71% for Copyleaks, 58% for QuillBot, and 52% for GPTZero. Few-shot and CoT prompting showed high accuracy, with the best five-shot result misclassifying only one of 49 samples (AI recall 96%, human recall 100%).