Projected Forward Gradient-Guided Frank-Wolfe Algorithm via Variance Reduction

📅 2024-03-19
🏛️ IEEE Control Systems Letters
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost, excessive memory consumption, and unstable convergence of Frank–Wolfe (FW) algorithms in training deep neural networks—particularly under non-convex settings—this paper proposes Projected-FG, a low-computation and low-memory projected forward-gradient estimation method, and integrates it into the FW framework for the first time. We further design a historical-direction aggregation mechanism to suppress gradient variance, and provide rigorous theoretical guarantees: Projected-FG converges to the global optimum in convex settings and to a first-order stationary point in non-convex settings. Empirical evaluations demonstrate that Projected-FG reduces GPU memory usage by up to 58% while maintaining optimization accuracy and stability comparable to standard FW and SGD. This work establishes a new, provably convergent paradigm for large-scale non-convex optimization.

Technology Category

Application Category

📝 Abstract
This paper aims to enhance the use of the Frank-Wolfe (FW) algorithm for training deep neural networks. Similar to any gradient-based optimization algorithm, FW suffers from high computational and memory costs when computing gradients for DNNs. This paper introduces the application of the recently proposed projected forward gradient (Projected-FG) method to the FW framework, offering reduced computational cost similar to backpropagation and low memory utilization akin to forward propagation. Our results show that trivial application of the Projected-FG introduces non-vanishing convergence error due to the stochastic noise that the Projected-FG method introduces in the process. This noise results in an non-vanishing variance in the Projected-FG estimated gradient. To address this, we propose a variance reduction approach by aggregating historical Projected-FG directions. We demonstrate rigorously that this approach ensures convergence to the optimal solution for convex functions and to a stationary point for non-convex functions. These convergence properties are validated through a numerical example, showcasing the approach's effectiveness and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Frank-Wolfe Algorithm
Deep Neural Networks
Computational Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved Frank-Wolfe Algorithm
Projected Forward Gradient
Directional Variance Reduction
🔎 Similar Papers
No similar papers found.
M
M. Rostami
Department of Mechanical and Aerospace Engineering, University of California Irvine
Solmaz S. Kia
Solmaz S. Kia
Professor, University of California Irvine
Control theoryDistributed algorithm design for cooperative networked systemsData fusion