🤖 AI Summary
Existing optimizers suffer from unstable neuron outputs and insufficient generalization: Adam converges rapidly but generalizes poorly, whereas SGD generalizes well yet converges slowly. This paper proposes ADAACT, a novel optimization algorithm that—uniquely—dynamically adjusts the learning rate at the neuron level based on activation variance. ADAACT employs a gradient-driven, lightweight, backpropagation-compatible adaptation mechanism. It significantly improves output stability while achieving state-of-the-art generalization performance on CIFAR and ImageNet; its convergence speed approaches that of Adam, and its generalization capability matches or exceeds that of SGD, with negligible additional training overhead. The core innovation lies in integrating activation variance into the learning rate adaptation paradigm—thereby bridging the long-standing trade-off between convergence efficiency and generalization ability.
📝 Abstract
We introduce ADAACT, a novel optimization algorithm that adjusts learning rates according to activation variance. Our method enhances the stability of neuron outputs by incorporating neuron-wise adaptivity during the training process, which subsequently leads to better generalization— a complementary approach to conventional activation regularization methods. Experimental results demonstrate ADAACT’s competitive performance across standard image classification benchmarks. We evaluate ADAACT on CIFAR and ImageNet, comparing it with other state-of-the-art methods. Importantly, ADAACT effectively bridges the gap between the convergence speed of Adam and the strong generalization capabilities of SGD, all while maintaining competitive execution times.