Accelerating Min-Max Optimization via Power-Law Stepsizes

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

174K/year
🤖 AI Summary
This work addresses the slow last-iterate convergence of the Extragradient method in unconstrained bilinear minimax optimization by introducing an acceleration mechanism based on dynamic stepsize scheduling. By formulating stepsize selection as an optimization problem, the paper establishes—for the first time—that convergence rates can be improved using only dynamically adjusted stepsizes, leading to a deterministic stepsize scheme following a power-law decay. The approach is further extended by allowing distinct stepsizes for the extrapolation and update steps, thereby approaching the optimal convergence rate. Under synchronized stepsizes, the method achieves a convergence rate of $O(T^{-2/3 + \varepsilon})$, which is shown to be tight in this setting; with asynchronous stepsizes, the rate improves to nearly optimal $O(T^{-1 + \varepsilon})$.
📝 Abstract
We revisit the convergence guarantees of the Extragradient (EG) method for unconstrained biaffine min-max optimization. It is known that EG with a fixed stepsize achieves a $Θ(T^{-1/2})$ last-iterate convergence rate, which is slower than the optimal $\mathcal{O}(T^{-1})$ rate attainable by incorporating additional mechanisms such as anchoring. Motivated by recent advances showing that dynamic stepsizes alone can significantly accelerate gradient descent, we ask whether dynamic stepsizes can similarly accelerate the last-iterate convergence of EG. We present the first positive result in this direction. Specifically, we provide a deterministic dynamic stepsize schedule that accelerates the convergence rate of EG to $\mathcal{O}(T^{-2/3+\varepsilon})$ for any $\varepsilon > 0$. We also show that this rate is tight when the extrapolation and update steps of EG use the same stepsize. We then show that allowing different stepsizes for the extrapolation and update steps further improves the convergence rate to the near-optimal $\mathcal{O}(T^{-1+\varepsilon})$. Our analysis reduces stepsize scheduling to an optimization problem, whose solution leads to a stepsize schedule that follows (a discretization of) a power-law distribution. Our proposed stepsize schedules and analysis extend to other methods, such as Optimistic Gradient (OG), and suggest broader applicability to general min-max optimization problems.
Problem

Research questions and friction points this paper is trying to address.

min-max optimization
Extragradient method
dynamic stepsizes
convergence rate
power-law
Innovation

Methods, ideas, or system contributions that make the work stand out.

power-law stepsizes
min-max optimization
extragradient method
last-iterate convergence
dynamic stepsize scheduling
🔎 Similar Papers
2024-08-23International Conference on Learning RepresentationsCitations: 0