Projected Forward Gradient-Guided Frank-Wolfe Algorithm via Variance Reduction

📅 2024-03-19

🏛️ IEEE Control Systems Letters

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the high computational cost, excessive memory consumption, and unstable convergence of Frank–Wolfe (FW) algorithms in training deep neural networks—particularly under non-convex settings—this paper proposes Projected-FG, a low-computation and low-memory projected forward-gradient estimation method, and integrates it into the FW framework for the first time. We further design a historical-direction aggregation mechanism to suppress gradient variance, and provide rigorous theoretical guarantees: Projected-FG converges to the global optimum in convex settings and to a first-order stationary point in non-convex settings. Empirical evaluations demonstrate that Projected-FG reduces GPU memory usage by up to 58% while maintaining optimization accuracy and stability comparable to standard FW and SGD. This work establishes a new, provably convergent paradigm for large-scale non-convex optimization.

Technology Category

Application Category

📝 Abstract

This paper aims to enhance the use of the Frank-Wolfe (FW) algorithm for training deep neural networks. Similar to any gradient-based optimization algorithm, FW suffers from high computational and memory costs when computing gradients for DNNs. This paper introduces the application of the recently proposed projected forward gradient (Projected-FG) method to the FW framework, offering reduced computational cost similar to backpropagation and low memory utilization akin to forward propagation. Our results show that trivial application of the Projected-FG introduces non-vanishing convergence error due to the stochastic noise that the Projected-FG method introduces in the process. This noise results in an non-vanishing variance in the Projected-FG estimated gradient. To address this, we propose a variance reduction approach by aggregating historical Projected-FG directions. We demonstrate rigorously that this approach ensures convergence to the optimal solution for convex functions and to a stationary point for non-convex functions. These convergence properties are validated through a numerical example, showcasing the approach's effectiveness and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Frank-Wolfe Algorithm

Deep Neural Networks

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved Frank-Wolfe Algorithm

Projected Forward Gradient

Directional Variance Reduction

🔎 Similar Papers

No similar papers found.

Authors to Follow