🤖 AI Summary
Detecting black-box backdoor attacks against object detection models at inference time remains challenging—especially due to complex artifacts (e.g., phantom or disappearing objects) induced by multi-object outputs and triggers, which render conventional detection methods ineffective.
Method: We propose TRACE, the first model-agnostic, gradient-free, and architecture-agnostic test-time backdoor detection framework. Leveraging a novel empirical observation—that poisoned samples exhibit higher detection consistency under background transformations, whereas clean samples show greater consistency under focal-length variations—TRACE quantifies confidence variance across foreground, background, and focal-length transformations to assess transformational consistency.
Results: Evaluated on COCO and PASCAL VOC, TRACE achieves a 30% AUROC improvement over state-of-the-art methods and demonstrates robustness against adaptive attacks.
📝 Abstract
Object detection models are vulnerable to backdoor attacks, where attackers poison a small subset of training samples by embedding a predefined trigger to manipulate prediction. Detecting poisoned samples (i.e., those containing triggers) at test time can prevent backdoor activation. However, unlike image classification tasks, the unique characteristics of object detection -- particularly its output of numerous objects -- pose fresh challenges for backdoor detection. The complex attack effects (e.g.,"ghost"object emergence or"vanishing"object) further render current defenses fundamentally inadequate. To this end, we design TRAnsformation Consistency Evaluation (TRACE), a brand-new method for detecting poisoned samples at test time in object detection. Our journey begins with two intriguing observations: (1) poisoned samples exhibit significantly more consistent detection results than clean ones across varied backgrounds. (2) clean samples show higher detection consistency when introduced to different focal information. Based on these phenomena, TRACE applies foreground and background transformations to each test sample, then assesses transformation consistency by calculating the variance in objects confidences. TRACE achieves black-box, universal backdoor detection, with extensive experiments showing a 30% improvement in AUROC over state-of-the-art defenses and resistance to adaptive attacks.