Adaptive Sharpness-Aware Minimization with a Polyak-type Step size: A Theory-Grounded Scheduler

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This work addresses the high sensitivity of Sharpness-Aware Minimization (SAM) to learning rate selection, which typically necessitates extensive hyperparameter tuning. By integrating the stochastic Polyak stepsize principle into the SAM framework, the authors propose an adaptive stepsize scheduler grounded in theoretical analysis—marking the first such mechanism for SAM-type algorithms applicable to both deterministic and stochastic optimization settings. Under strong convexity and convexity assumptions, the method is shown to achieve linear convergence and an $O(1/T)$ convergence rate, respectively. Empirical evaluations demonstrate that the proposed approach matches or even surpasses carefully tuned SAM baselines without requiring meticulous hyperparameter adjustment, thereby substantially reducing the cost of hyperparameter tuning.
📝 Abstract
Sharpness-Aware Minimization (SAM) has established itself as a powerful and widely adopted optimizer for training machine learning models. By explicitly minimizing the sharpness of the loss landscape, SAM often improves generalization while delivering strong empirical performance. However, SAM and its variants, like most training algorithms, are sensitive to the choice of learning rate, which is typically selected through extensive hyperparameter tuning or predefined schedulers. In this work, motivated by recent advances on the effectiveness of stochastic Polyak step sizes for Stochastic Gradient Descent (SGD), we derive Polyak schedulers tailored to SAM-style updates, yielding novel adaptive algorithms in both deterministic and stochastic settings. In the smooth setting, we prove linear convergence for strongly convex objectives and an $\mathcal{O}(1/T)$ convergence rate for convex objectives in the deterministic case. In the stochastic setting, we establish analogous convergence guarantees up to a neighborhood of the optimum. Numerical experiments demonstrate that the proposed Polyak schedulers achieve performance comparable to or better than carefully tuned SAM baselines, while substantially reducing the need for learning-rate tuning.
Problem

Research questions and friction points this paper is trying to address.

Sharpness-Aware Minimization
learning rate sensitivity
hyperparameter tuning
optimization
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sharpness-Aware Minimization
Polyak step size
adaptive learning rate
convergence analysis
stochastic optimization
🔎 Similar Papers
No similar papers found.