🤖 AI Summary
This work addresses the high sensitivity of stochastic optimization methods to learning rates in over-parameterized models and the associated burden of manual hyperparameter tuning. The authors propose the first unified stochastic trust-region framework that eliminates the need for handcrafted stepsize schedules by leveraging an adaptive trust-region mechanism, thereby enabling stable optimization for both unconstrained and equality-constrained problems. Theoretically, they establish rigorous complexity guarantees for both settings. Algorithmically, they integrate a first-order stochastic trust-region approach with a quadratic penalty strategy, achieving adaptive stepsize control and exact handling of hard constraints under a strong growth condition. Empirical results demonstrate that the method matches or exceeds the performance of carefully tuned baselines in deep neural network training and orthogonal-constrained subspace fitting, while offering superior optimization stability and more reliable constraint satisfaction.
📝 Abstract
Under interpolation-type assumptions such as the strong growth condition, stochastic optimization methods can attain convergence rates comparable to full-batch methods, but their performance, particularly for SGD, remains highly sensitive to step-size selection. To address this issue, we propose a unified stochastic trust-region framework that eliminates manual step-size tuning and extends naturally to equality-constrained problems. For unconstrained optimization, we develop a first-order stochastic trust-region algorithm and show that, under the strong growth condition, it achieves an iteration and stochastic first-order oracle complexity of $O(\varepsilon^{-2} \log(1/\varepsilon))$ for finding an $\varepsilon$-stationary point. For equality-constrained problems, we introduce a quadratic-penalty-based stochastic trust-region method with penalty parameter $μ$, and establish an iteration and oracle complexity of $O(\varepsilon^{-4} \log(1/\varepsilon))$ to reach an $\varepsilon$-stationary point of the penalized problem, corresponding to an $O(\varepsilon)$-approximate KKT point of the original constrained problem. Numerical experiments on deep neural network training and orthogonally constrained subspace fitting demonstrate that the proposed methods achieve performance comparable to well-tuned stochastic baselines, while exhibiting stable optimization behavior and effectively handling hard constraints without manual learning-rate scheduling.