Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noises

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the practical challenges in decentralized stochastic bilevel optimization where the lower-level objective is non-strongly convex and stochastic gradients exhibit heavy-tailed noise—potentially with infinite variance. We propose NR-DSBO, the first normalized stochastic variance-reduced algorithm for decentralized bilevel optimization that operates without gradient clipping. Our method integrates normalized gradient estimation with a decentralized variance reduction mechanism, enabling rigorous convergence guarantees for nonconvex bilevel problems without assuming strong convexity or finite gradient variance. Theoretically, NR-DSBO achieves a sublinear convergence rate under heavy-tailed noise. Empirically, it significantly outperforms existing methods in both communication efficiency and robustness to gradient corruption.

Technology Category

Application Category

📝 Abstract
Existing decentralized stochastic optimization methods assume the lower-level loss function is strongly convex and the stochastic gradient noise has finite variance. These strong assumptions typically are not satisfied in real-world machine learning models. To address these limitations, we develop a novel decentralized stochastic bilevel optimization algorithm for the nonconvex bilevel optimization problem under heavy-tailed noises. Specifically, we develop a normalized stochastic variance-reduced bilevel gradient descent algorithm, which does not rely on any clipping operation. Moreover, we establish its convergence rate by innovatively bounding interdependent gradient sequences under heavy-tailed noises for nonconvex decentralized bilevel optimization problems. As far as we know, this is the first decentralized bilevel optimization algorithm with rigorous theoretical guarantees under heavy-tailed noises. The extensive experimental results confirm the effectiveness of our algorithm in handling heavy-tailed noises.
Problem

Research questions and friction points this paper is trying to address.

Addresses nonconvex decentralized bilevel optimization under heavy-tailed noises
Develops normalized variance-reduced algorithm without clipping operations
Establishes convergence guarantees for interdependent gradient sequences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalized stochastic variance-reduced gradient descent
No clipping operation required
Bounding interdependent gradient sequences theoretically
🔎 Similar Papers
No similar papers found.