Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
We address stochastic nonconvex-strongly-concave minimax and nonconvex-strongly-convex bilevel optimization problems under unknown gradient noise levels. We propose the first algorithm that is both adaptive and equipped with tight convergence guarantees. Methodologically, our approach integrates momentum normalization with a novel adaptive step-size scheme, enabling automatic adaptation to varying noise intensities without requiring prior knowledge of the noise level. Theoretically, we establish the first adaptive convergence analysis framework for stochastic bilevel optimization, achieving a gradient norm convergence rate of $widetilde{O}(1/sqrt{T} + sqrt{ar{sigma}}/T^{1/4})$ after $T$ iterations—tight and order-optimal across both high- and low-noise regimes. Empirically, our method consistently outperforms existing non-adaptive baselines on synthetic benchmarks and deep learning tasks, including hyperparameter optimization and generative model training.

Technology Category

Application Category

📝 Abstract
Hierarchical optimization refers to problems with interdependent decision variables and objectives, such as minimax and bilevel formulations. While various algorithms have been proposed, existing methods and analyses lack adaptivity in stochastic optimization settings: they cannot achieve optimal convergence rates across a wide spectrum of gradient noise levels without prior knowledge of the noise magnitude. In this paper, we propose novel adaptive algorithms for two important classes of stochastic hierarchical optimization problems: nonconvex-strongly-concave minimax optimization and nonconvex-strongly-convex bilevel optimization. Our algorithms achieve sharp convergence rates of $widetilde{O}(1/sqrt{T} + sqrt{arσ}/T^{1/4})$ in $T$ iterations for the gradient norm, where $arσ$ is an upper bound on the stochastic gradient noise. Notably, these rates are obtained without prior knowledge of the noise level, thereby enabling automatic adaptivity in both low and high-noise regimes. To our knowledge, this work provides the first adaptive and sharp convergence guarantees for stochastic hierarchical optimization. Our algorithm design combines the momentum normalization technique with novel adaptive parameter choices. Extensive experiments on synthetic and deep learning tasks demonstrate the effectiveness of our proposed algorithms.
Problem

Research questions and friction points this paper is trying to address.

Adaptive algorithms for stochastic hierarchical optimization without noise knowledge
Achieving sharp convergence rates in minimax and bilevel optimization problems
Enabling automatic adaptivity across different gradient noise regimes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive algorithms for stochastic hierarchical optimization
Momentum normalization with adaptive parameter choices
Sharp convergence rates without noise level knowledge
🔎 Similar Papers
No similar papers found.
Xiaochuan Gong
Xiaochuan Gong
George Mason University
J
Jie Hao
George Mason University
M
Mingrui Liu
George Mason University