NysAct: A Scalable Preconditioned Gradient Descent using Nyström Approximation

📅 2024-12-15
🏛️ BigData Congress [Services Society]
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In deep learning optimization, first-order methods (e.g., Adam) suffer from poor generalization, while second-order methods (e.g., K-FAC) achieve superior generalization at prohibitive computational and memory costs. To address this trade-off, we propose a scalable gradient preconditioning framework featuring a novel eigenvalue-shifted Nyström approximation for modeling activation covariance matrices. This approach retains the generalization benefits of second-order optimization without explicit Hessian computation or large-scale matrix inversion, achieving near-linear time and space complexity. Crucially, it avoids costly dense matrix operations while preserving curvature information essential for effective preconditioning. Empirically, our method surpasses Adam and K-FAC in test accuracy across multiple benchmark tasks—including image classification and language modeling—while reducing memory and computational overhead by an order of magnitude compared to standard second-order optimizers. The result is a principled, efficient, and scalable optimization strategy that reconciles strong generalization with practical tractability.

Technology Category

Application Category

📝 Abstract
Adaptive gradient methods are computationally efficient and converge quickly, but they often suffer from poor generalization. In contrast, second-order methods enhance convergence and generalization but typically incur high computational and memory costs. In this work, we introduce NYSACT, a scalable first-order gradient preconditioning method that strikes a balance between state-of-the-art first-order and second-order optimization methods. NYSACT leverages an eigenvalue-shifted Nyström method to approximate the activation covariance matrix, which is used as a preconditioning matrix, significantly reducing time and memory complexities with minimal impact on test accuracy. Our experiments show that NYSACT not only achieves improved test accuracy compared to both first-order and second-order methods but also demands considerably less computational resources than existing second-order methods.
Problem

Research questions and friction points this paper is trying to address.

Balances first and second-order optimization methods
Reduces computational and memory costs significantly
Improves test accuracy with efficient preconditioning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Nystrom approximation for preconditioning
Balances first and second-order optimization
Reduces time and memory complexities
🔎 Similar Papers
No similar papers found.
H
Hyunseok Seung
Department of Statistics, University of Georgia
J
Jaewoo Lee
School of Computing, University of Georgia
Hyunsuk Ko
Hyunsuk Ko
Associate Professor, School of Electrical Engineering, Hanyang University ERICA
Video CodingDeep LearningComputer VisionImage Quality Assessment