NysAct: A Scalable Preconditioned Gradient Descent using Nyström Approximation

📅 2024-12-15

🏛️ BigData Congress [Services Society]

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

In deep learning optimization, first-order methods (e.g., Adam) suffer from poor generalization, while second-order methods (e.g., K-FAC) achieve superior generalization at prohibitive computational and memory costs. To address this trade-off, we propose a scalable gradient preconditioning framework featuring a novel eigenvalue-shifted Nyström approximation for modeling activation covariance matrices. This approach retains the generalization benefits of second-order optimization without explicit Hessian computation or large-scale matrix inversion, achieving near-linear time and space complexity. Crucially, it avoids costly dense matrix operations while preserving curvature information essential for effective preconditioning. Empirically, our method surpasses Adam and K-FAC in test accuracy across multiple benchmark tasks—including image classification and language modeling—while reducing memory and computational overhead by an order of magnitude compared to standard second-order optimizers. The result is a principled, efficient, and scalable optimization strategy that reconciles strong generalization with practical tractability.

Technology Category

Application Category

📝 Abstract

Adaptive gradient methods are computationally efficient and converge quickly, but they often suffer from poor generalization. In contrast, second-order methods enhance convergence and generalization but typically incur high computational and memory costs. In this work, we introduce NYSACT, a scalable first-order gradient preconditioning method that strikes a balance between state-of-the-art first-order and second-order optimization methods. NYSACT leverages an eigenvalue-shifted Nyström method to approximate the activation covariance matrix, which is used as a preconditioning matrix, significantly reducing time and memory complexities with minimal impact on test accuracy. Our experiments show that NYSACT not only achieves improved test accuracy compared to both first-order and second-order methods but also demands considerably less computational resources than existing second-order methods.

Problem

Research questions and friction points this paper is trying to address.

Balances first and second-order optimization methods

Reduces computational and memory costs significantly

Improves test accuracy with efficient preconditioning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Nystrom approximation for preconditioning

Balances first and second-order optimization

Reduces time and memory complexities

🔎 Similar Papers

No similar papers found.