SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization

📅 2024-02-27
🏛️ Neural Information Processing Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale neural network block-wise pruning faces a fundamental trade-off: differentiable methods lack structural guarantees, while combinatorial optimization suffers from poor scalability. Method: This paper proposes a differentiable guidance framework for structured pruning. We theoretically prove that several differentiable pruning approaches are equivalent to a class of nonconvex group-sparse regularizations—establishing, for the first time, uniqueness, group sparsity, and near-optimality of the global solution. Leveraging this insight, we design a novel paradigm integrating differentiable importance scoring with greedy combinatorial search to enable efficient block-level sparsification. Contribution/Results: Our method unifies nonconvex regularization analysis, group-sparse optimization, and differentiable attention mechanisms. It achieves state-of-the-art accuracy and efficiency on ImageNet and Criteo, balancing scalability for large models with structural interpretability.

Technology Category

Application Category

📝 Abstract
Neural network pruning is a key technique towards engineering large yet scalable, interpretable, and generalizable models. Prior work on the subject has developed largely along two orthogonal directions: (1) differentiable pruning for efficiently and accurately scoring the importance of parameters, and (2) combinatorial optimization for efficiently searching over the space of sparse models. We unite the two approaches, both theoretically and empirically, to produce a coherent framework for structured neural network pruning in which differentiable pruning guides combinatorial optimization algorithms to select the most important sparse set of parameters. Theoretically, we show how many existing differentiable pruning techniques can be understood as nonconvex regularization for group sparse optimization, and prove that for a wide class of nonconvex regularizers, the global optimum is unique, group-sparse, and provably yields an approximate solution to a sparse convex optimization problem. The resulting algorithm that we propose, SequentialAttention++, advances the state of the art in large-scale neural network block-wise pruning tasks on the ImageNet and Criteo datasets.
Problem

Research questions and friction points this paper is trying to address.

Uniting differentiable pruning with combinatorial optimization for structured neural network pruning
Providing theoretical guarantees for nonconvex regularization in group sparse optimization
Advancing state-of-the-art in large-scale block-wise pruning on ImageNet and Criteo
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uniting differentiable pruning with combinatorial optimization
Nonconvex regularization for group sparse optimization
Global optimum yields approximate sparse convex solution
🔎 Similar Papers
No similar papers found.
T
T. Yasuda
Carnegie Mellon University, Google Research
Kyriakos Axiotis
Kyriakos Axiotis
MIT
Gang Fu
Gang Fu
Amazon
Machine LearningDeep LearningSemantic Network Analysis
M
M. Bateni
Google Research
V
V. Mirrokni
Google Research