Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning

📅 2024-04-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

To address slow convergence and suboptimal optimization performance in deep neural network training, this paper proposes CG-like-Adam—a novel adaptive optimizer that integrates the direction conjugacy principle of conjugate gradient (CG) methods into the Adam framework. It reformulates Adam’s first- and second-moment estimation mechanisms to jointly enhance adaptive learning rate scheduling and search direction quality. Theoretically, we establish convergence guarantees under constant decay coefficients and unbiased first-moment estimation. Empirically, CG-like-Adam achieves significantly faster convergence and higher generalization accuracy on CIFAR-10 and CIFAR-100 compared to standard Adam and prominent variants (e.g., AdamW, RMSProp). Its core contribution lies in the principled unification of CG-style directional optimization with adaptive moment estimation, introducing a new design paradigm for adaptive optimizers that bridges classical optimization theory and modern deep learning practice.

Technology Category

Application Category

📝 Abstract

Training deep neural networks is a challenging task. In order to speed up training and enhance the performance of deep neural networks, we rectify the vanilla conjugate gradient as conjugate-gradient-like and incorporate it into the generic Adam, and thus propose a new optimization algorithm named CG-like-Adam for deep learning. Specifically, both the first-order and the second-order moment estimation of generic Adam are replaced by the conjugate-gradient-like. Convergence analysis handles the cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. Numerical experiments show the superiority of the proposed algorithm based on the CIFAR10/100 dataset.

Problem

Research questions and friction points this paper is trying to address.

Complex Neural Networks

Training Speed

Performance Challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

CG-like-Adam

Deep Learning Optimization

Convergence Analysis

🔎 Similar Papers

BADM: Batch ADMM for Deep Learning