🤖 AI Summary
Large-scale neural network block-wise pruning faces a fundamental trade-off: differentiable methods lack structural guarantees, while combinatorial optimization suffers from poor scalability. Method: This paper proposes a differentiable guidance framework for structured pruning. We theoretically prove that several differentiable pruning approaches are equivalent to a class of nonconvex group-sparse regularizations—establishing, for the first time, uniqueness, group sparsity, and near-optimality of the global solution. Leveraging this insight, we design a novel paradigm integrating differentiable importance scoring with greedy combinatorial search to enable efficient block-level sparsification. Contribution/Results: Our method unifies nonconvex regularization analysis, group-sparse optimization, and differentiable attention mechanisms. It achieves state-of-the-art accuracy and efficiency on ImageNet and Criteo, balancing scalability for large models with structural interpretability.
📝 Abstract
Neural network pruning is a key technique towards engineering large yet scalable, interpretable, and generalizable models. Prior work on the subject has developed largely along two orthogonal directions: (1) differentiable pruning for efficiently and accurately scoring the importance of parameters, and (2) combinatorial optimization for efficiently searching over the space of sparse models. We unite the two approaches, both theoretically and empirically, to produce a coherent framework for structured neural network pruning in which differentiable pruning guides combinatorial optimization algorithms to select the most important sparse set of parameters. Theoretically, we show how many existing differentiable pruning techniques can be understood as nonconvex regularization for group sparse optimization, and prove that for a wide class of nonconvex regularizers, the global optimum is unique, group-sparse, and provably yields an approximate solution to a sparse convex optimization problem. The resulting algorithm that we propose, SequentialAttention++, advances the state of the art in large-scale neural network block-wise pruning tasks on the ImageNet and Criteo datasets.