🤖 AI Summary
This work identifies and addresses a novel privacy threat in federated learning: a malicious orchestrator manipulates model aggregation to induce targeted overfitting on specific clients, thereby compromising model integrity. Unlike conventional defenses that focus on mitigating information leakage, we propose the first client-side proactive verification framework. It introduces three lightweight, interpretable, and real-time verifiable detection methods—label-flipping verification, backdoor trigger injection, and model fingerprint analysis—to monitor the aggregation process. Extensive experiments across multiple datasets and attack scenarios demonstrate that all three methods reliably detect targeted overfitting with low latency, low false-positive rates, and bounded computational overhead. Our approach significantly enhances clients’ autonomy in defending against integrity-preserving adversarial aggregation, establishing a new paradigm for verifiable federated learning.
📝 Abstract
Federated Learning (FL) enables collaborative model training across decentralised clients while keeping local data private, making it a widely adopted privacy-enhancing technology (PET). Despite its privacy benefits, FL remains vulnerable to privacy attacks, including those targeting specific clients. In this paper, we study an underexplored threat where a dishonest orchestrator intentionally manipulates the aggregation process to induce targeted overfitting in the local models of specific clients. Whereas many studies in this area predominantly focus on reducing the amount of information leakage during training, we focus on enabling an early client-side detection of targeted overfitting, thereby allowing clients to disengage before significant harm occurs. In line with this, we propose three detection techniques - (a) label flipping, (b) backdoor trigger injection, and (c) model fingerprinting - that enable clients to verify the integrity of the global aggregation. We evaluated our methods on multiple datasets under different attack scenarios. Our results show that the three methods reliably detect targeted overfitting induced by the orchestrator, but they differ in terms of computational complexity, detection latency, and false-positive rates.