'1'-bit Count-based Sorting Unit to Reduce Link Power in DNN Accelerators

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high interconnect power consumption in deep neural network (DNN) accelerators caused by excessive switching activity. To mitigate this issue, the authors propose a comparison-free hardware sorting architecture based on counting the number of “1” bits in data words, which reorders operands to minimize bit toggling across communication links. To balance energy efficiency and hardware overhead, an approximation-aware, coarse-grained bucket grouping strategy is introduced. This approach maintains an average 19.50% reduction in bit switching while significantly reducing the area of the sorting unit. Experimental results demonstrate that, in convolutional neural network (CNN)-oriented scenarios, the proposed method achieves up to 35.4% hardware area savings, effectively reconciling power optimization with resource efficiency.

Technology Category

Application Category

📝 Abstract
Interconnect power consumption remains a bottleneck in Deep Neural Network (DNN) accelerators. While ordering data based on'1'-bit counts can mitigate this via reduced switching activity, practical hardware sorting implementations remain underexplored. This work proposes the hardware implementation of a comparison-free sorting unit optimized for Convolutional Neural Networks (CNN). By leveraging approximate computing to group population counts into coarse-grained buckets, our design achieves hardware area reductions while preserving the link power benefits of data reordering. Our approximate sorting unit achieves up to 35.4% area reduction while maintaining 19.50\% BT reduction compared to 20.42% of precise implementation.
Problem

Research questions and friction points this paper is trying to address.

interconnect power
DNN accelerators
1-bit count
sorting unit
switching activity
Innovation

Methods, ideas, or system contributions that make the work stand out.

1-bit count sorting
approximate computing
DNN accelerator
interconnect power reduction
comparison-free sorting
🔎 Similar Papers
No similar papers found.
R
Ruichi Han
Department of Electronics and Embedded Systems, KTH Royal Institute of Technology, Stockholm, Sweden
Y
Yizhi Chen
Department of Electronics and Embedded Systems, KTH Royal Institute of Technology, Stockholm, Sweden
T
Tong Lei
Department of Electronics and Embedded Systems, KTH Royal Institute of Technology, Stockholm, Sweden
J
Jordi Altayó González
Department of Electronics and Embedded Systems, KTH Royal Institute of Technology, Stockholm, Sweden
Ahmed Hemani
Ahmed Hemani
KTH Royal Institute of Technology, Stockholm
VLSI designNeural NetworksMassively Parallel ArchtiecturesDesign Automation