$ exttt{Droid}$: A Resource Suite for AI-Generated Code Detection

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Existing AI-generated code detectors suffer from poor generalization, limited cross-language and cross-domain transferability, and vulnerability to prompt engineering and human-aligned rewriting strategies. To address these limitations, we propose DroidDetect—a novel detection framework—and DroidCollection, a large-scale, open-source benchmark dataset encompassing 7 programming languages, 43 LLMs, and real-world collaborative coding scenarios. Methodologically, DroidDetect innovatively integrates human-AI collaborative code with adversarial examples, introduces metric learning and uncertainty-aware resampling for the first time in this domain, and employs an encoder-only architecture to enable multi-task joint training and rigorous cross-domain evaluation. Experimental results demonstrate that DroidDetect achieves substantial robustness gains with minimal adversarial samples: it improves F1 scores by up to 28.6% on cross-language and cross-domain benchmarks—significantly outperforming state-of-the-art methods. This work establishes a highly generalizable and robust foundation for reliable AI-generated code detection.

Technology Category

Application Category

📝 Abstract

In this work, we compile $ extbf{$ exttt{DroidCollection}$}$, the most extensive open data suite for training and evaluating machine-generated code detectors, comprising over a million code samples, seven programming languages, outputs from 43 coding models, and over three real-world coding domains. Alongside fully AI-generated samples, our collection includes human-AI co-authored code, as well as adversarial samples explicitly crafted to evade detection. Subsequently, we develop $ extbf{$ exttt{DroidDetect}$}$, a suite of encoder-only detectors trained using a multi-task objective over $ exttt{DroidCollection}$. Our experiments show that existing detectors' performance fails to generalise to diverse coding domains and programming languages outside of their narrow training data. Additionally, we demonstrate that while most detectors are easily compromised by humanising the output distributions using superficial prompting and alignment approaches, this problem can be easily amended by training on a small amount of adversarial data. Finally, we demonstrate the effectiveness of metric learning and uncertainty-based resampling as means to enhance detector training on possibly noisy distributions.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated code across diverse programming languages and domains

Improving generalization of detectors beyond narrow training data

Addressing adversarial evasion in AI-generated code detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extensive open data suite for code detection

Multi-task trained encoder-only detectors

Metric learning enhances noisy distribution training

🔎 Similar Papers

Is The Watermarking Of LLM-Generated Code Robust?

2024-03-24Citations: 1