🤖 AI Summary
This paper systematically investigates architecture-level backdoors in deep learning—highly stealthy and persistent threats wherein malicious logic is embedded directly into the model’s computational graph, evading conventional defenses such as data sanitization and retraining. We first unify diverse attack vectors, including compiler-layer tampering, AutoML pipeline contamination, and supply-chain implantation. To address them, we propose a multi-stage detection framework integrating static computational graph analysis, dynamic fuzzing, and lightweight formal verification—achieving high practicality while significantly improving detection of distributed and stealthy triggers. Experimental evaluation exposes fundamental limitations of existing defenses in scalability and end-to-end trustworthy model delivery. Accordingly, we introduce a cross-stage protection framework for trusted model delivery, a cryptography-enhanced model attestation mechanism, and a new benchmark specifically designed for architecture-level backdoor evaluation.
📝 Abstract
Architectural backdoors pose an under-examined but critical threat to deep neural networks, embedding malicious logic directly into a model's computational graph. Unlike traditional data poisoning or parameter manipulation, architectural backdoors evade standard mitigation techniques and persist even after clean retraining. This survey systematically consolidates research on architectural backdoors, spanning compiler-level manipulations, tainted AutoML pipelines, and supply-chain vulnerabilities. We assess emerging detection and defense strategies, including static graph inspection, dynamic fuzzing, and partial formal verification, and highlight their limitations against distributed or stealth triggers. Despite recent progress, scalable and practical defenses remain elusive. We conclude by outlining open challenges and proposing directions for strengthening supply-chain security, cryptographic model attestations, and next-generation benchmarks. This survey aims to guide future research toward comprehensive defenses against structural backdoor threats in deep learning systems.