🤖 AI Summary
Exploiting weight sparsity in RRAM-based compute-in-memory (CIM) systems is challenging because unstructured sparsity disrupts the regular computational patterns of crossbar arrays.
Method: This paper proposes a column-similarity-aware bit-level weight reordering technique. It models bit-level sparsity as inter-column bit-value consistency, enabling compact and structured mapping of sparse weights onto RRAM arrays by detecting and merging columns with identical bit values across weight bits. Integrated with two’s-complement encoding and column-similarity detection, the method ensures hardware efficiency while achieving high sparsity compression.
Results: Evaluated on representative neural networks, the approach achieves an average 61.24% performance improvement and 1.51×–2.52× energy reduction, with modest overhead. It is the first work to efficiently unify sparsity exploitation and crossbar computational efficiency in RRAM CIM architectures.
📝 Abstract
Compute-in-Memory (CIM) and weight sparsity are two effective techniques to reduce data movement during Neural Network (NN) inference. However, they can hardly be employed in the same accelerator simultaneously because CIM requires structural compute patterns which are disrupted in sparse NNs. In this paper, we partially solve this issue by proposing a bit level weight reordering strategy which can realize compact mapping of sparse NN weight matrices onto Resistive Random Access Memory (RRAM) based NN Accelerators (RRAM-Acc). In specific, when weights are mapped to RRAM crossbars in a binary complement manner, we can observe that, which can also be mathematically proven, bit-level sparsity and similarity commonly exist in the crossbars. The bit reordering method treats bit sparsity as a special case of bit similarity, reserve only one column in a pair of columns that have identical bit values, and then map the compressed weight matrices into Operation Units (OU). The performance of our design is evaluated with typical NNs. Simulation results show a 61.24% average performance improvement and 1.51x-2.52x energy savings under different sparsity ratios, with only slight overhead compared to the state-of-the-art design.