🤖 AI Summary
This work addresses the challenge of Byzantine resilience in decentralized optimization, where malicious nodes can send arbitrary messages that compromise algorithmic convergence. The authors propose GT-PD, the first method to achieve Byzantine robustness under a fully decentralized setting while preserving compatibility with doubly stochastic mixing structures. GT-PD integrates self-centered projection clipping, a probabilistic edge-dropping mechanism based on dual-metric trust scores, and a leaky integrator to effectively suppress the accumulation of tracking errors. Experimental results on MNIST demonstrate that GT-PD consistently outperforms existing approaches against various attacks; notably, GT-PD-L improves accuracy by up to 4.3 percentage points over coordinate-wise trimmed mean and achieves linear convergence when Byzantine nodes are completely isolated.
📝 Abstract
We study distributed optimization over networks with Byzantine agents that may send arbitrary adversarial messages. We propose \emph{Gradient Tracking with Probabilistic Edge Dropout} (GT-PD), a stochastic gradient tracking method that preserves the convergence properties of gradient tracking under adversarial communication. GT-PD combines two complementary defense layers: a universal self-centered projection that clips each incoming message to a ball of radius $τ$ around the receiving agent, and a fully decentralized probabilistic dropout rule driven by a dual-metric trust score in the decision and tracking channels. This design bounds adversarial perturbations while preserving the doubly stochastic mixing structure, a property often lost under robust aggregation in decentralized settings. Under complete Byzantine isolation ($p_b=0$), GT-PD converges linearly to a neighborhood determined solely by stochastic gradient variance. For partial isolation ($p_b>0$), we introduce \emph{Gradient Tracking with Probabilistic Edge Dropout and Leaky Integration} (GT-PD-L), which uses a leaky integrator to control the accumulation of tracking errors caused by persistent perturbations and achieves linear convergence to a bounded neighborhood determined by the stochastic variance and the clipping-to-leak ratio. We further show that under two-tier dropout with $p_h=1$, isolating Byzantine agents introduces no additional variance into the honest consensus dynamics. Experiments on MNIST under Sign Flip, ALIE, and Inner Product Manipulation attacks show that GT-PD-L outperforms coordinate-wise trimmed mean by up to 4.3 percentage points under stealth attacks.