๐ค AI Summary
In federated learning, post-hoc attribution of poisoning attacks remains infeasible when training-time defenses fail. To address this, we propose FLForensicsโthe first forensic framework for federated poisoning attribution. FLForensics identifies malicious clients by analyzing the global modelโs misclassification behavior on target samples, integrating gradient provenance, client-wise contribution attribution, and statistical significance testing. We provide theoretical guarantees that rigorously distinguish benign from malicious clients. Moreover, we formally model adaptive poisoning attacks for the first time and ensure robust traceability under such threats. Evaluated across five benchmark datasets, FLForensics achieves high recall (>92%) and low false positive rate (<3.5%) against both classical and adaptive poisoning attacks. Our work bridges a critical gap in post-deployment security auditing for federated learning systems.
๐ Abstract
Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non-iid or the number of malicious clients is large, as confirmed in our experiments. In this work, we propose FLForensics, the first poison-forensics method for FL. FLForensics complements existing training-phase defenses. In particular, when training-phase defenses fail and a poisoned global model is deployed, FLForensics aims to trace back the malicious clients that performed the poisoning attack after a misclassified target input is identified. We theoretically show that FLForensics can accurately distinguish between benign and malicious clients under a formal definition of poisoning attack. Moreover, we empirically show the effectiveness of FLForensics at tracing back both existing and adaptive poisoning attacks on five benchmark datasets.