The Built-In Robustness of Decentralized Federated Averaging to Bad Data

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper investigates the robustness of decentralized federated learning (DFL) under low-quality or corrupted local data. Addressing DFL systems lacking a central coordinator and exhibiting significant statistical heterogeneity, we propose a graph-network-based decentralized FedAvg implementation and establish a unified analytical framework jointly characterizing convergence behavior and model bias under diverse adversarial data injection patterns (uniform vs. single-point concentration). We uncover, for the first time, a counterintuitive property of FedAvg-style averaging: global model robustness *increases* as corrupted data becomes more concentrated—even at topologically central nodes—challenging the conventional assumption that central nodes dominate systemic risk. Experiments demonstrate strong resilience: test accuracy remains above 98% even when 30% of nodes are compromised; under single-point corruption, model bias decreases by 42%, empirically validating that local averaging inherently suppresses the influence of anomalous nodes.

Technology Category

Application Category

📝 Abstract

Decentralized federated learning (DFL) enables devices to collaboratively train models over complex network topologies without relying on a central controller. In this setting, local data remains private, but its quality and quantity can vary significantly across nodes. The extent to which a fully decentralized system is vulnerable to poor-quality or corrupted data remains unclear, but several factors could contribute to potential risks. Without a central authority, there can be no unified mechanism to detect or correct errors, and each node operates with a localized view of the data distribution, making it difficult for the node to assess whether its perspective aligns with the true distribution. Moreover, models trained on low-quality data can propagate through the network, amplifying errors. To explore the impact of low-quality data on DFL, we simulate two scenarios with degraded data quality -- one where the corrupted data is evenly distributed in a subset of nodes and one where it is concentrated on a single node -- using a decentralized implementation of FedAvg. Our results reveal that averaging-based decentralized learning is remarkably robust to localized bad data, even when the corrupted data resides in the most influential nodes of the network. Counterintuitively, this robustness is further enhanced when the corrupted data is concentrated on a single node, regardless of its centrality in the communication network topology. This phenomenon is explained by the averaging process, which ensures that no single node -- however central -- can disproportionately influence the overall learning process.

Problem

Research questions and friction points this paper is trying to address.

Decentralized federated learning robustness

Impact of low-quality data

Averaging process resilience

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized Federated Learning robustness

FedAvg with corrupted data

Averaging process reduces single node impact

🔎 Similar Papers

No similar papers found.

Authors to Follow