🤖 AI Summary
To address the low computational efficiency and high memory overhead of convolution operators on RISC-V platforms, this work proposes a column-wise N:M fine-grained pruning strategy—the first hardware-friendly sparsification optimization tailored for RISC-V vector architectures. Methodologically, it integrates im2col with data packing to reduce memory access overhead, modifies XNNPACK to enable vectorized execution of pruned sparse convolutions, and introduces operator fusion and automated configuration selection. Experiments on ResNet variants demonstrate up to a 4.0× improvement in inference throughput, with ImageNet Top-1 accuracy degradation limited to ≤2.1% relative to dense baselines—significantly outperforming existing sparse acceleration approaches. This work establishes a systematic optimization framework for efficient sparse inference on RISC-V edge devices.
📝 Abstract
In deep learning frameworks, weight pruning is a widely used technique for improving computational efficiency by reducing the size of large models. This is especially critical for convolutional operators, which often act as performance bottlenecks in convolutional neural networks (CNNs). However, the effectiveness of pruning heavily depends on how it is implemented, as different methods can significantly impact both computational performance and memory footprint. In this work, we propose a column-wise N:M pruning strategy applied at the tile level and modify XNNPACK to enable efficient execution of pruned models on the RISC-V vector architecture. Additionally, we propose fusing the operations of im2col and data packing to minimize redundant memory accesses and memory overhead. To further optimize performance, we incorporate AITemplate's profiling technique to identify the optimal implementation for each convolutional operator. Our proposed approach effectively increases ResNet inference throughput by as much as 4.0x, and preserves ImageNet top-1 accuracy within 2.1% of the dense baseline.