An Adaptive Method Stabilizing Activations for Enhanced Generalization

📅 2024-12-09

🏛️ 2024 IEEE International Conference on Data Mining Workshops (ICDMW)

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing optimizers suffer from unstable neuron outputs and insufficient generalization: Adam converges rapidly but generalizes poorly, whereas SGD generalizes well yet converges slowly. This paper proposes ADAACT, a novel optimization algorithm that—uniquely—dynamically adjusts the learning rate at the neuron level based on activation variance. ADAACT employs a gradient-driven, lightweight, backpropagation-compatible adaptation mechanism. It significantly improves output stability while achieving state-of-the-art generalization performance on CIFAR and ImageNet; its convergence speed approaches that of Adam, and its generalization capability matches or exceeds that of SGD, with negligible additional training overhead. The core innovation lies in integrating activation variance into the learning rate adaptation paradigm—thereby bridging the long-standing trade-off between convergence efficiency and generalization ability.

Technology Category

Application Category

📝 Abstract

We introduce ADAACT, a novel optimization algorithm that adjusts learning rates according to activation variance. Our method enhances the stability of neuron outputs by incorporating neuron-wise adaptivity during the training process, which subsequently leads to better generalization— a complementary approach to conventional activation regularization methods. Experimental results demonstrate ADAACT’s competitive performance across standard image classification benchmarks. We evaluate ADAACT on CIFAR and ImageNet, comparing it with other state-of-the-art methods. Importantly, ADAACT effectively bridges the gap between the convergence speed of Adam and the strong generalization capabilities of SGD, all while maintaining competitive execution times.

Problem

Research questions and friction points this paper is trying to address.

Stabilizes neuron activations for better generalization

Bridges gap between Adam's speed and SGD's generalization

Adjusts learning rates based on activation variance

Innovation

Methods, ideas, or system contributions that make the work stand out.

AdaAct adjusts learning rates adaptively

Enhances neuron output stability

Bridges gap between Adam and SGD

🔎 Similar Papers

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers