🤖 AI Summary
To address the limitations of existing detection methods for adversarial and backdoor attacks in safety-critical scenarios—namely, their reliance on attack priors and poor generalizability—this paper proposes the first unified detection framework centered on the intrinsic characteristics of noise. The method treats input reconstruction residuals as noise signals and achieves robust, attack-agnostic detection via decoupled reconstruction, semantic representation learning, and unsupervised anomaly discrimination—without assuming attack types or model architectures. It supports simultaneous detection of white-box and black-box adversarial attacks as well as backdoor attacks. On CIFAR-10, it achieves AUROC scores of 0.954 (white-box), 0.852 (black-box), and 0.992 (backdoor), with only a 1% false positive rate—substantially outperforming baselines such as MagNet. Its core innovation lies in treating noise itself as a universal detection signal, thereby overcoming attack-specific constraints and establishing a novel, interpretable, and scalable defense paradigm against unknown threats.
📝 Abstract
The exponential adoption of machine learning (ML) is propelling the world into a future of intelligent automation and data-driven solutions. However, the proliferation of malicious data manipulation attacks against ML, namely adversarial and backdoor attacks, jeopardizes its reliability in safety-critical applications. The existing detection methods against such attacks are built upon assumptions, limiting them in diverse practical scenarios. Thus, motivated by the need for a more robust and unified defense mechanism, we investigate the shared traits of adversarial and backdoor attacks and propose NoiSec that leverages solely the noise, the foundational root cause of such attacks, to detect any malicious data alterations. NoiSec is a reconstruction-based detector that disentangles the noise from the test input, extracts the underlying features from the noise, and leverages them to recognize systematic malicious manipulation. Experimental evaluations conducted on the CIFAR10 dataset demonstrate the efficacy of NoiSec, achieving AUROC scores exceeding 0.954 and 0.852 under white-box and black-box adversarial attacks, respectively, and 0.992 against backdoor attacks. Notably, NoiSec maintains a high detection performance, keeping the false positive rate within only 1%. Comparative analyses against MagNet-based baselines reveal NoiSec's superior performance across various attack scenarios.