Sparse Training of Neural Networks based on Multilevel Mirror Descent

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of achieving high sparsity during training while preserving model accuracy and substantially reducing computational costs. The authors propose a dynamic sparse training algorithm based on multilevel mirror descent, which alternates between updating static and dynamic sparsity patterns within a linearized Bregman iteration framework. An adaptive network structure freezing mechanism is introduced to efficiently explore the sparse parameter space. By integrating multilevel optimization with mirror descent, the method ensures convergence while significantly lowering floating-point operations. Experimental results on standard benchmarks demonstrate that the approach reduces FLOPs from 38% (relative to dense SGD training) down to just 6%, while maintaining comparable test accuracy—substantially outperforming existing sparse training methods.

Technology Category

Application Category

📝 Abstract

We introduce a dynamic sparse training algorithm based on linearized Bregman iterations / mirror descent that exploits the naturally incurred sparsity by alternating between periods of static and dynamic sparsity pattern updates. The key idea is to combine sparsity-inducing Bregman iterations with adaptive freezing of the network structure to enable efficient exploration of the sparse parameter space while maintaining sparsity. We provide convergence guaranties by embedding our method in a multilevel optimization framework. Furthermore, we empirically show that our algorithm can produce highly sparse and accurate models on standard benchmarks. We also show that the theoretical number of FLOPs compared to SGD training can be reduced from 38% for standard Bregman iterations to 6% for our method while maintaining test accuracy.

Problem

Research questions and friction points this paper is trying to address.

sparse training

neural networks

mirror descent

sparsity

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Training

Mirror Descent

Bregman Iterations