🤖 AI Summary
This work addresses the challenge of detecting and identifying physically realizable, naturally inconspicuous backdoor triggers—such as eyeglass frames or earrings—in face recognition models. We propose a detection method grounded in deep feature analysis and inverse trigger pattern inference. Our approach jointly leverages feature-space anomaly detection and a reversible trigger reconstruction mechanism, enabling, for the first time, end-to-end identification of natural, real-world triggers without relying on exhaustive search. It overcomes the computational inefficiency and poor generalizability inherent in conventional brute-force methods. Evaluated on compromised face recognition models, our method achieves 74% Top-5 trigger identification accuracy—outperforming baseline approaches by 18 percentage points. This advancement significantly enhances model trustworthiness and defensive capability in high-security applications.
📝 Abstract
Backdoor attacks embed a hidden functionality into deep neural networks, causing the network to display anomalous behavior when activated by a predetermined pattern in the input Trigger, while behaving well otherwise on public test data. Recent works have shown that backdoored face recognition (FR) systems can respond to natural-looking triggers like a particular pair of sunglasses. Such attacks pose a serious threat to the applicability of FR systems in high-security applications. We propose a novel technique to (1) detect whether an FR network is compromised with a natural, physically realizable trigger, and (2) identify such triggers given a compromised network. We demonstrate the effectiveness of our methods with a compromised FR network, where we are able to identify the trigger (e.g., green sunglasses or red hat) with a top-5 accuracy of 74%, whereas a naive brute force baseline achieves 56% accuracy.