SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization

📅 2024-02-27

🏛️ Neural Information Processing Systems

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large-scale neural network block-wise pruning faces a fundamental trade-off: differentiable methods lack structural guarantees, while combinatorial optimization suffers from poor scalability. Method: This paper proposes a differentiable guidance framework for structured pruning. We theoretically prove that several differentiable pruning approaches are equivalent to a class of nonconvex group-sparse regularizations—establishing, for the first time, uniqueness, group sparsity, and near-optimality of the global solution. Leveraging this insight, we design a novel paradigm integrating differentiable importance scoring with greedy combinatorial search to enable efficient block-level sparsification. Contribution/Results: Our method unifies nonconvex regularization analysis, group-sparse optimization, and differentiable attention mechanisms. It achieves state-of-the-art accuracy and efficiency on ImageNet and Criteo, balancing scalability for large models with structural interpretability.

Technology Category

Application Category

📝 Abstract

Neural network pruning is a key technique towards engineering large yet scalable, interpretable, and generalizable models. Prior work on the subject has developed largely along two orthogonal directions: (1) differentiable pruning for efficiently and accurately scoring the importance of parameters, and (2) combinatorial optimization for efficiently searching over the space of sparse models. We unite the two approaches, both theoretically and empirically, to produce a coherent framework for structured neural network pruning in which differentiable pruning guides combinatorial optimization algorithms to select the most important sparse set of parameters. Theoretically, we show how many existing differentiable pruning techniques can be understood as nonconvex regularization for group sparse optimization, and prove that for a wide class of nonconvex regularizers, the global optimum is unique, group-sparse, and provably yields an approximate solution to a sparse convex optimization problem. The resulting algorithm that we propose, SequentialAttention++, advances the state of the art in large-scale neural network block-wise pruning tasks on the ImageNet and Criteo datasets.

Problem

Research questions and friction points this paper is trying to address.

Uniting differentiable pruning with combinatorial optimization for structured neural network pruning

Providing theoretical guarantees for nonconvex regularization in group sparse optimization

Advancing state-of-the-art in large-scale block-wise pruning on ImageNet and Criteo

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uniting differentiable pruning with combinatorial optimization

Nonconvex regularization for group sparse optimization

Global optimum yields approximate sparse convex solution

🔎 Similar Papers

No similar papers found.

Authors to Follow