π€ AI Summary
This work proposes a lightweight, model-agnostic probing framework that leverages low-rank adaptation (LoRA) modules as detectors to identify backdoored and membership samples without relying on clean reference models, extensive retraining, or strong assumptions about the attack mechanism. By attaching task-specific LoRA adapters to a frozen backbone network, the method analyzes optimization dynamics and representation shifts induced by suspicious inputs, enabling differentiation between poisoned or member samples and benign onesβwithout access to original training data or modifications to the deployed model. Integrating ranking and energy-based statistics, the approach effectively captures distinctive low-rank update patterns triggered by malicious samples, achieving high-confidence backdoor detection and membership inference.
π Abstract
Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical settings. Existing defenses for backdoor detection and membership inference typically require access to clean reference models, extensive retraining, or strong assumptions about the attack mechanism. In this work, we introduce a novel LoRA-based oracle framework that leverages low-rank adaptation modules as a lightweight, model-agnostic probe for both backdoor detection and membership inference. Our approach attaches task-specific LoRA adapters to a frozen backbone and analyzes their optimization dynamics and representation shifts when exposed to suspicious samples. We show that poisoned and member samples induce distinctive low-rank updates that differ significantly from those generated by clean or non-member data. These signals can be measured using simple ranking and energy-based statistics, enabling reliable inference without access to the original training data or modification of the deployed model.