Stochastic Trust-Region Methods for Over-parameterized Models

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the high sensitivity of stochastic optimization methods to learning rates in over-parameterized models and the associated burden of manual hyperparameter tuning. The authors propose the first unified stochastic trust-region framework that eliminates the need for handcrafted stepsize schedules by leveraging an adaptive trust-region mechanism, thereby enabling stable optimization for both unconstrained and equality-constrained problems. Theoretically, they establish rigorous complexity guarantees for both settings. Algorithmically, they integrate a first-order stochastic trust-region approach with a quadratic penalty strategy, achieving adaptive stepsize control and exact handling of hard constraints under a strong growth condition. Empirical results demonstrate that the method matches or exceeds the performance of carefully tuned baselines in deep neural network training and orthogonal-constrained subspace fitting, while offering superior optimization stability and more reliable constraint satisfaction.

Technology Category

Application Category

📝 Abstract

Under interpolation-type assumptions such as the strong growth condition, stochastic optimization methods can attain convergence rates comparable to full-batch methods, but their performance, particularly for SGD, remains highly sensitive to step-size selection. To address this issue, we propose a unified stochastic trust-region framework that eliminates manual step-size tuning and extends naturally to equality-constrained problems. For unconstrained optimization, we develop a first-order stochastic trust-region algorithm and show that, under the strong growth condition, it achieves an iteration and stochastic first-order oracle complexity of $O(\varepsilon^{-2} \log(1/\varepsilon))$ for finding an $\varepsilon$-stationary point. For equality-constrained problems, we introduce a quadratic-penalty-based stochastic trust-region method with penalty parameter $μ$, and establish an iteration and oracle complexity of $O(\varepsilon^{-4} \log(1/\varepsilon))$ to reach an $\varepsilon$-stationary point of the penalized problem, corresponding to an $O(\varepsilon)$-approximate KKT point of the original constrained problem. Numerical experiments on deep neural network training and orthogonally constrained subspace fitting demonstrate that the proposed methods achieve performance comparable to well-tuned stochastic baselines, while exhibiting stable optimization behavior and effectively handling hard constraints without manual learning-rate scheduling.

Problem

Research questions and friction points this paper is trying to address.

stochastic optimization

step-size sensitivity

over-parameterized models

equality constraints

trust-region methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

stochastic trust-region

over-parameterized models

strong growth condition