🤖 AI Summary
In federated learning (FL), malicious participants can exploit model uncertainty to undermine the reliability of global predictions; however, existing attacks lack targeted manipulation of uncertainty and theoretical guarantees. This paper proposes Delphi, the first attack framework that integrates KL-divergence-based uncertainty quantification with Bayesian optimization (BO) or trust-region methods to strategically perturb parameters in the first hidden layer of local models—thereby maximizing output uncertainty of the global model. We provide rigorous theoretical analysis proving Delphi’s efficacy and uncovering a fundamental vulnerability of FL systems in parameter space to uncertainty-oriented attacks. Experiments on standard benchmarks demonstrate that Delphi-BO increases the predictive entropy of the global model by over 300%, substantially exposing the trustworthiness risks of FL under adversarial settings.
📝 Abstract
As we transition from Narrow Artificial Intelligence towards Artificial Super Intelligence, users are increasingly concerned about their privacy and the trustworthiness of machine learning (ML) technology. A common denominator for the metrics of trustworthiness is the quantification of uncertainty inherent in DL algorithms, and specifically in the model parameters, input data, and model predictions. One of the common approaches to address privacy-related issues in DL is to adopt distributed learning such as federated learning (FL), where private raw data is not shared among users. Despite the privacy-preserving mechanisms in FL, it still faces challenges in trustworthiness. Specifically, the malicious users, during training, can systematically create malicious model parameters to compromise the models predictive and generative capabilities, resulting in high uncertainty about their reliability. To demonstrate malicious behaviour, we propose a novel model poisoning attack method named Delphi which aims to maximise the uncertainty of the global model output. We achieve this by taking advantage of the relationship between the uncertainty and the model parameters of the first hidden layer of the local model. Delphi employs two types of optimisation , Bayesian Optimisation and Least Squares Trust Region, to search for the optimal poisoned model parameters, named as Delphi-BO and Delphi-LSTR. We quantify the uncertainty using the KL Divergence to minimise the distance of the predictive probability distribution towards an uncertain distribution of model output. Furthermore, we establish a mathematical proof for the attack effectiveness demonstrated in FL. Numerical results demonstrate that Delphi-BO induces a higher amount of uncertainty than Delphi-LSTR highlighting vulnerability of FL systems to model poisoning attacks.