🤖 AI Summary
To address energy-efficiency and latency bottlenecks in CNN hardware acceleration, this paper proposes DSLRCNN, a domain-specific accelerator leveraging left-to-right (LR) digit-serial arithmetic. It innovatively integrates LR digit-serial computation—operating in most-significant-digit-first (MSDF) mode—into CNN accelerator design, enabling fine-grained digit-level pipelining and parallel multiply-accumulate (MAC) operations under low interconnect overhead and small area constraints. Implemented in Verilog and synthesized in GSCL 45nm CMOS technology, DSLRCNN incorporates custom LR multipliers/adders and a digit-level pipelined convolution engine. Evaluated on AlexNet, VGG-16, and ResNet-18, it achieves 4.37×–569.11× higher peak throughput and 3.58×–44.75× improved energy efficiency (TOPS/W) over baseline accelerators, while significantly reducing inference latency, silicon area, and power consumption.
📝 Abstract
Digit-serial arithmetic has emerged as a viable approach for designing hardware accelerators, reducing interconnections, area utilization, and power consumption. However, conventional methods suffer from performance and latency issues. To address these challenges, we propose an accelerator design using left-to-right (LR) arithmetic, which performs computations in a most-significant digit first (MSDF) manner, enabling digit-level pipelining. This leads to substantial performance improvements and reduced latency. The processing engine is designed for convolutional neural networks (CNNs), which includes low-latency LR multipliers and adders for digit-level parallelism. The proposed DSLR-CNN is implemented in Verilog and synthesized with Synopsys design compiler using GSCL 45nm technology, the DSLR-CNN accelerator was evaluated on AlexNet, VGG-16, and ResNet-18 networks. Results show significant improvements across key performance evaluation metrics, including response time, peak performance, power consumption, operational intensity, area efficiency, and energy efficiency. The peak performance measured in GOPS of the proposed design is <inline-formula> <tex-math notation="LaTeX">$4.37 imes $ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$569.11 imes $ </tex-math></inline-formula> higher than contemporary designs, and it achieved <inline-formula> <tex-math notation="LaTeX">$3.58 imes $ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$44.75 imes $ </tex-math></inline-formula> higher peak energy efficiency (TOPS/W), outperforming conventional bit-serial designs.