🤖 AI Summary
To address slow convergence and suboptimal optimization performance in deep neural network training, this paper proposes CG-like-Adam—a novel adaptive optimizer that integrates the direction conjugacy principle of conjugate gradient (CG) methods into the Adam framework. It reformulates Adam’s first- and second-moment estimation mechanisms to jointly enhance adaptive learning rate scheduling and search direction quality. Theoretically, we establish convergence guarantees under constant decay coefficients and unbiased first-moment estimation. Empirically, CG-like-Adam achieves significantly faster convergence and higher generalization accuracy on CIFAR-10 and CIFAR-100 compared to standard Adam and prominent variants (e.g., AdamW, RMSProp). Its core contribution lies in the principled unification of CG-style directional optimization with adaptive moment estimation, introducing a new design paradigm for adaptive optimizers that bridges classical optimization theory and modern deep learning practice.
📝 Abstract
Training deep neural networks is a challenging task. In order to speed up training and enhance the performance of deep neural networks, we rectify the vanilla conjugate gradient as conjugate-gradient-like and incorporate it into the generic Adam, and thus propose a new optimization algorithm named CG-like-Adam for deep learning. Specifically, both the first-order and the second-order moment estimation of generic Adam are replaced by the conjugate-gradient-like. Convergence analysis handles the cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. Numerical experiments show the superiority of the proposed algorithm based on the CIFAR10/100 dataset.