When to restart? Exploring escalating restarts on convergence

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work proposes SGD-ER, a novel learning rate scheduling method that addresses the limitations of existing strategies which rely on fixed or periodic restarts and thus struggle to adaptively respond to training dynamics. By introducing an adaptive restart mechanism triggered by the detection of training loss stagnation, SGD-ER initiates a linearly increasing learning rate upon convergence plateaus to enhance exploration of flat regions in the loss landscape, thereby facilitating escape from sharp minima toward solutions with better generalization. Integrated within the standard SGD framework, the method enables training-dynamics-aware optimization. Extensive experiments on CIFAR-10, CIFAR-100, and TinyImageNet with architectures including ResNet, VGG, and DenseNet demonstrate consistent improvements, yielding test accuracy gains of 0.5% to 4.5% over conventional schedulers.

Technology Category

Application Category

📝 Abstract

Learning rate scheduling plays a critical role in the optimization of deep neural networks, directly influencing convergence speed, stability, and generalization. While existing schedulers such as cosine annealing, cyclical learning rates, and warm restarts have shown promise, they often rely on fixed or periodic triggers that are agnostic to the training dynamics, such as stagnation or convergence behavior. In this work, we propose a simple yet effective strategy, which we call Stochastic Gradient Descent with Escalating Restarts (SGD-ER). It adaptively increases the learning rate upon convergence. Our method monitors training progress and triggers restarts when stagnation is detected, linearly escalating the learning rate to escape sharp local minima and explore flatter regions of the loss landscape. We evaluate SGD-ER across CIFAR-10, CIFAR-100, and TinyImageNet on a range of architectures including ResNet-18/34/50, VGG-16, and DenseNet-101. Compared to standard schedulers, SGD-ER improves test accuracy by 0.5-4.5%, demonstrating the benefit of convergence-aware escalating restarts for better local optima.

Problem

Research questions and friction points this paper is trying to address.

learning rate scheduling

convergence

stagnation

local minima

deep neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

learning rate scheduling

adaptive restarts

convergence-aware optimization