🤖 AI Summary
Existing RNN pruning strategies suffer from low efficiency in hyperparameter selection, high trial-and-error costs, and lack of early performance guarantees. Method: We propose a Lyapunov spectrum–based dynamical similarity metric for training-free, differentiable early stopping of pruning variants and efficient hyperparameter search. This work pioneers the integration of dynamical systems stability theory into pruning evaluation, establishing a “hyperpruning” paradigm that automatically identifies, under a fixed sparsity budget, the optimal architecture outperforming the original dense model in accuracy. Contribution/Results: The method is compatible with LSTM, RHN, and AWD-LSTM-MoS. On PTB and WikiText-2, combined with locality-sensitive hashing (LSH) acceleration, search time is reduced by an order of magnitude. The resulting models achieve fewer parameters, lower computational cost, and significantly higher accuracy than both baseline pruned models and the original dense networks.
📝 Abstract
A variety of pruning methods have been introduced for over-parameterized Recurrent Neural Networks to improve efficiency in terms of power consumption and storage utilization. These advances motivate a new paradigm, termed `hyperpruning', which seeks to identify the most suitable pruning strategy for a given network architecture and application. Unlike conventional hyperparameter search, where the optimal configuration's accuracy remains uncertain, in the context of network pruning, the accuracy of the dense model sets the target for the accuracy of the pruned one. The goal, therefore, is to discover pruned variants that match or even surpass this established accuracy. However, exhaustive search over pruning configurations is computationally expensive and lacks early performance guarantees. To address this challenge, we propose a novel Lyapunov Spectrum (LS)-based distance metric that enables early comparison between pruned and dense networks, allowing accurate prediction of post-training performance. By integrating this LS-based distance with standard hyperparameter optimization algorithms, we introduce an efficient hyperpruning framework, termed LS-based Hyperpruning (LSH). LSH reduces search time by an order of magnitude compared to conventional approaches relying on full training. Experiments on stacked LSTM and RHN architectures using the Penn Treebank dataset, and on AWD-LSTM-MoS using WikiText-2, demonstrate that under fixed training budgets and target pruning ratios, LSH consistently identifies superior pruned models. Remarkably, these pruned variants not only outperform those selected by loss-based baseline but also exceed the performance of their dense counterpart.