🤖 AI Summary
To address the high computational cost of CNN inference on resource-constrained devices—without relying on additional data or fine-tuning—this paper proposes HASTE, a data-free, training-free, plug-and-play dynamic convolution module. Its core innovation lies in leveraging Locality-Sensitive Hashing (LSH) to model channel-wise feature redundancy at runtime, enabling real-time detection and structured compression of redundant input channels and filter depths via dynamic pruning. To our knowledge, HASTE is the first fully data- and training-free method for dynamic structured pruning. Evaluated on CIFAR-10, it reduces FLOPs of ResNet-34 by 46.72% with only a 1.25% accuracy drop. Cross-dataset generalization is validated on ImageNet, demonstrating substantial improvements in edge deployment efficiency.
📝 Abstract
To reduce the computational cost of convolutional neural networks (CNNs) on resource-constrained devices, structured pruning approaches have shown promise in lowering floating-point operations (FLOPs) without substantial drops in accuracy. However, most methods require fine-tuning or specific training procedures to achieve a reasonable trade-off between retained accuracy and reduction in FLOPs, adding computational overhead and requiring training data to be available. To this end, we propose HASTE (Hashing for Tractable Efficiency), a data-free, plug-and-play convolution module that instantly reduces a network's test-time inference cost without training or fine-tuning. Our approach utilizes locality-sensitive hashing (LSH) to detect redundancies in the channel dimension of latent feature maps, compressing similar channels to reduce input and filter depth simultaneously, resulting in cheaper convolutions. We demonstrate our approach on the popular vision benchmarks CIFAR-10 and ImageNet, where we achieve a 46.72% reduction in FLOPs with only a 1.25% loss in accuracy by swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.