🤖 AI Summary
This work addresses a critical limitation in existing backdoor attacks against federated learning, which often rely on unrealistic synthetic or out-of-distribution triggers that poorly reflect real-world threats. To bridge this gap, the authors propose SABLE, the first method to construct semantically meaningful, in-distribution, and visually natural triggers—such as “wearing sunglasses”—within federated settings. SABLE employs an aggregation-aware malicious objective that combines feature disentanglement with parameter regularization, steering malicious model updates to closely resemble benign ones in parameter space. Evaluated on CelebA and GTSRB, SABLE achieves high attack success rates with only minimal poisoning of local data while preserving benign accuracy. It remains effective across diverse aggregation rules, significantly enhancing attack stealth and interpretability, thereby challenging the prevailing robustness evaluation paradigms built upon synthetic triggers.
📝 Abstract
Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. In this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically meaningful, in-distribution, and visually plausible. We propose SABLE, a Semantics-Aware Backdoor for LEarning in federated settings, which constructs natural, content-consistent triggers (e.g., semantic attribute changes such as sunglasses) and optimizes an aggregation-aware malicious objective with feature separation and parameter regularization to keep attacker updates close to benign ones. We instantiate SABLE on CelebA hair-color classification and the German Traffic Sign Recognition Benchmark (GTSRB), poisoning only a small, interpretable subset of each malicious client's local data while otherwise following the standard FL protocol. Across heterogeneous client partitions and multiple aggregation rules (FedAvg, Trimmed Mean, MultiKrum, and FLAME), our semantics-driven triggers achieve high targeted attack success rates while preserving benign test accuracy. These results show that semantics-aligned backdoors remain a potent and practical threat in federated learning, and that robustness claims based solely on synthetic patch triggers can be overly optimistic.