An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

📅 2024-09-28

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

We study stochastic bilevel optimization where the upper-level objective is nonconvex, the lower-level objective is strongly convex, and the upper-level smoothness constant grows linearly with the gradient norm—hence lacks a global Lipschitz constant. We propose AccBO, the first algorithm achieving an oracle complexity of Õ(1/ε³) for finding an ε-stationary point. Methodologically, AccBO employs normalized stochastic gradient descent with recursive momentum for the upper-level update, and a stochastic Nesterov-accelerated method with iterate averaging for the lower-level subproblem. Crucially, we establish a novel high-probability lemma characterizing the dynamics of stochastic Nesterov acceleration under distributional drift, substantially improving hypergradient estimation accuracy. Theoretically, AccBO reduces the oracle complexity by a factor of ε⁻¹ compared to the state-of-the-art. Empirically, it significantly outperforms existing bilevel optimization baselines on sequence learning tasks, including RNN-based text classification.

Technology Category

Application Category

📝 Abstract

This paper investigates a class of stochastic bilevel optimization problems where the upper-level function is nonconvex with potentially unbounded smoothness and the lower-level problem is strongly convex. These problems have significant applications in sequential data learning, such as text classification using recurrent neural networks. The unbounded smoothness is characterized by the smoothness constant of the upper-level function scaling linearly with the gradient norm, lacking a uniform upper bound. Existing state-of-the-art algorithms require $widetilde{O}(1/epsilon^4)$ oracle calls of stochastic gradient or Hessian/Jacobian-vector product to find an $epsilon$-stationary point. However, it remains unclear if we can further improve the convergence rate when the assumptions for the function in the population level also hold for each random realization almost surely (e.g., Lipschitzness of each realization of the stochastic gradient). To address this issue, we propose a new Accelerated Bilevel Optimization algorithm named AccBO. The algorithm updates the upper-level variable by normalized stochastic gradient descent with recursive momentum and the lower-level variable by the stochastic Nesterov accelerated gradient descent algorithm with averaging. We prove that our algorithm achieves an oracle complexity of $widetilde{O}(1/epsilon^3)$ to find an $epsilon$-stationary point. Our proof relies on a novel lemma characterizing the dynamics of stochastic Nesterov accelerated gradient descent algorithm under distribution drift with high probability for the lower-level variable, which is of independent interest and also plays a crucial role in analyzing the hypergradient estimation error over time. Experimental results on various tasks confirm that our proposed algorithm achieves the predicted theoretical acceleration and significantly outperforms baselines in bilevel optimization.

Problem

Research questions and friction points this paper is trying to address.

Bilevel Optimization

Stochastic Problems

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

AccBO Algorithm

Bi-level Stochastic Optimization

Efficient Solution Updates

🔎 Similar Papers

No similar papers found.

Authors to Follow