DSLR-CNN: Efficient CNN Acceleration Using Digit-Serial Left-to-Right Arithmetic

📅 2025-01-03
🏛️ IEEE Access
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address energy-efficiency and latency bottlenecks in CNN hardware acceleration, this paper proposes DSLRCNN, a domain-specific accelerator leveraging left-to-right (LR) digit-serial arithmetic. It innovatively integrates LR digit-serial computation—operating in most-significant-digit-first (MSDF) mode—into CNN accelerator design, enabling fine-grained digit-level pipelining and parallel multiply-accumulate (MAC) operations under low interconnect overhead and small area constraints. Implemented in Verilog and synthesized in GSCL 45nm CMOS technology, DSLRCNN incorporates custom LR multipliers/adders and a digit-level pipelined convolution engine. Evaluated on AlexNet, VGG-16, and ResNet-18, it achieves 4.37×–569.11× higher peak throughput and 3.58×–44.75× improved energy efficiency (TOPS/W) over baseline accelerators, while significantly reducing inference latency, silicon area, and power consumption.

Technology Category

Application Category

📝 Abstract
Digit-serial arithmetic has emerged as a viable approach for designing hardware accelerators, reducing interconnections, area utilization, and power consumption. However, conventional methods suffer from performance and latency issues. To address these challenges, we propose an accelerator design using left-to-right (LR) arithmetic, which performs computations in a most-significant digit first (MSDF) manner, enabling digit-level pipelining. This leads to substantial performance improvements and reduced latency. The processing engine is designed for convolutional neural networks (CNNs), which includes low-latency LR multipliers and adders for digit-level parallelism. The proposed DSLR-CNN is implemented in Verilog and synthesized with Synopsys design compiler using GSCL 45nm technology, the DSLR-CNN accelerator was evaluated on AlexNet, VGG-16, and ResNet-18 networks. Results show significant improvements across key performance evaluation metrics, including response time, peak performance, power consumption, operational intensity, area efficiency, and energy efficiency. The peak performance measured in GOPS of the proposed design is <inline-formula> <tex-math notation="LaTeX">$4.37 imes $ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$569.11 imes $ </tex-math></inline-formula> higher than contemporary designs, and it achieved <inline-formula> <tex-math notation="LaTeX">$3.58 imes $ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$44.75 imes $ </tex-math></inline-formula> higher peak energy efficiency (TOPS/W), outperforming conventional bit-serial designs.
Problem

Research questions and friction points this paper is trying to address.

Hardware Accelerator
Convolutional Neural Networks
Efficiency Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

DSLR-CNN
Parallel Computing
Hardware Accelerator
🔎 Similar Papers
No similar papers found.
Malik Zohaib Nisar
Malik Zohaib Nisar
chosun University
Deep learningWireless CommunicationComputer ArithmeticHardware accelerators
M
M. Ibrahim
Department of Computer Engineering, College of IT Convergence, Chosun University, Gwangju, 61452, Republic of Korea
Saeid Gorgin
Saeid Gorgin
Sungkyunkwan University
Applied Machine LearningEmbedded AIHardware AcceleratorsComputer ArithmeticFPGA
M
Muhammad Usman
Department of Computer Engineering, College of IT Convergence, Chosun University, Gwangju, 61452, Republic of Korea; Faculty of Informatics and Data Science, University of Regensburg, Regensburg, 93053, Germany
Jeong-A Lee
Jeong-A Lee
Professor, Department of Computer Engineering, Chosun University
High Performance ComputingDigital ArithmeticMemory ArchitectureFault-tolerant Computing