🤖 AI Summary
This study addresses the detection challenge of triggerless data poisoning attacks against AI code-generation models (CodeBERT, CodeT5+, AST-T5). Existing defenses—relying on representation learning or static analysis—fail to reliably distinguish clean from poisoned samples. To overcome this limitation, we propose a multi-dimensional detection framework integrating spectrum-based feature analysis, neuron activation clustering, and lightweight AST-guided static analysis. Comprehensive experiments show that state-of-the-art detectors achieve AUCs below 0.72 under triggerless settings, indicating severe robustness deficiencies. Our approach improves average detection accuracy by 12.6% and, for the first time, empirically validates cross-model anomalous activation patterns as a generalizable poisoning indicator. The work uncovers the intrinsic stealthiness of triggerless poisoning and establishes both theoretical grounding and empirical evidence for trigger-agnostic defense paradigms.
📝 Abstract
Deep learning (DL) models for natural language-to-code generation have become integral to modern software development pipelines. However, their heavy reliance on large amounts of data, often collected from unsanitized online sources, exposes them to data poisoning attacks, where adversaries inject malicious samples to subtly bias model behavior. Recent targeted attacks silently replace secure code with semantically equivalent but vulnerable implementations without relying on explicit triggers to launch the attack, making it especially hard for detection methods to distinguish clean from poisoned samples. We present a systematic study on the effectiveness of existing poisoning detection methods under this stealthy threat model. Specifically, we perform targeted poisoning on three DL models (CodeBERT, CodeT5+, AST-T5), and evaluate spectral signatures analysis, activation clustering, and static analysis as defenses. Our results show that all methods struggle to detect triggerless poisoning, with representation-based approaches failing to isolate poisoned samples and static analysis suffering false positives and false negatives, highlighting the need for more robust, trigger-independent defenses for AI-assisted code generation.