Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning

📅 2024-04-02
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address slow convergence and suboptimal optimization performance in deep neural network training, this paper proposes CG-like-Adam—a novel adaptive optimizer that integrates the direction conjugacy principle of conjugate gradient (CG) methods into the Adam framework. It reformulates Adam’s first- and second-moment estimation mechanisms to jointly enhance adaptive learning rate scheduling and search direction quality. Theoretically, we establish convergence guarantees under constant decay coefficients and unbiased first-moment estimation. Empirically, CG-like-Adam achieves significantly faster convergence and higher generalization accuracy on CIFAR-10 and CIFAR-100 compared to standard Adam and prominent variants (e.g., AdamW, RMSProp). Its core contribution lies in the principled unification of CG-style directional optimization with adaptive moment estimation, introducing a new design paradigm for adaptive optimizers that bridges classical optimization theory and modern deep learning practice.

Technology Category

Application Category

📝 Abstract
Training deep neural networks is a challenging task. In order to speed up training and enhance the performance of deep neural networks, we rectify the vanilla conjugate gradient as conjugate-gradient-like and incorporate it into the generic Adam, and thus propose a new optimization algorithm named CG-like-Adam for deep learning. Specifically, both the first-order and the second-order moment estimation of generic Adam are replaced by the conjugate-gradient-like. Convergence analysis handles the cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. Numerical experiments show the superiority of the proposed algorithm based on the CIFAR10/100 dataset.
Problem

Research questions and friction points this paper is trying to address.

Complex Neural Networks
Training Speed
Performance Challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

CG-like-Adam
Deep Learning Optimization
Convergence Analysis
🔎 Similar Papers
2024-06-30arXiv.orgCitations: 1
J
Jiawu Tian
School of Mathematical Sciences, University of Electronic Science and Technology of China
L
Liwei Xu
School of Mathematical Sciences, University of Electronic Science and Technology of China
X
Xiaowei Zhang
School of Mathematical Sciences, University of Electronic Science and Technology of China
Y
Yongqi Li
School of Mathematical Sciences, University of Electronic Science and Technology of China