Bregman meets Lévy: Stochastic mirror descent with heavy-tailed noise in continuous and discrete time

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

This work investigates the convergence and robustness of stochastic mirror descent under heavy-tailed gradient noise with infinite variance. It innovatively introduces a Lévy process to construct a continuous-time “Lévy mirror flow” model, thereby incorporating jump-type heavy-tailed noise into the mirror descent framework for the first time and elucidating its impact on optimization trajectories and convergence rates. By leveraging Bregman divergence and a discrete–continuous time coupling analysis, the study establishes convergence guarantees for the corresponding discrete algorithm: in the convex setting, it achieves ε-optimality in 𝒪(ε⁻ᵖ⁄⁽ᵖ⁻¹⁾) iterations, and under relative strong convexity, it attains an improved rate of Õ(ε⁻¹⁄⁽ᵖ⁻¹⁾). These results provide a rigorous theoretical foundation and algorithmic support for robust optimization in heavy-tailed noise environments.

📝 Abstract

We study the robustness of stochastic mirror descent (SMD) under heavy-tailed noise, focusing on whether the method retains its convergence guarantees when run with infinite-variance stochastic gradient input. To address this question in a principled manner, we begin by introducing a continuous-time model of SMD as a stochastic differential equation (SDE) driven by a centered Lévy noise process with finite $p$-th order moments, $1 < p \leq 2$. This scheme -- which we call the Lévy mirror flow (LMF) -- arises naturally as the scaling limit of SMD in the presence of heavy-tailed noise. In particular, when $p < 2$ -- the heavy noise regime -- the trajectories of LMF generically exhibit jump discontinuities of arbitrary magnitude which, if frequent enough, lead to infinite variance. Nonetheless, despite this highly singular behavior, we show that LMF attains $ε$-optimality within $\mathcal{O}(ε^{-p/(p-1)})$ time in the convex case, and within $\mathcal{\tilde O}(ε^{-1/(p-1)})$ time for (relatively) strongly convex objectives. These guarantees provide a transparent characterization of the impact of frequent long jumps on the convergence of the process, and percolate to a series of matching discrete-time guarantees for several variants of SMD under heavy-tailed noise.

Problem

Research questions and friction points this paper is trying to address.

stochastic mirror descent

heavy-tailed noise

convergence guarantees

Lévy noise

infinite variance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic Mirror Descent

Lévy noise

heavy-tailed gradients