🤖 AI Summary
To address the challenge of efficiently exploiting unstructured sparsity in deep neural networks (DNNs) for edge AI, this paper proposes a virtually scalable systolic array architecture. The architecture introduces a novel virtual expansion mechanism that dynamically adapts to diverse sparsity patterns without physical reconfiguration, enabling adaptive computation granularity scaling—from dense to highly sparse DNN workloads—within a fixed hardware footprint. Integrated with sparsity-aware dataflow scheduling, virtual mapping and reconfiguration control, and a custom 16nm reusable multiply-accumulate (MAC) unit, the design achieves significant hardware efficiency gains. Evaluated against conventional systolic arrays delivering identical peak throughput, the proposed architecture reduces silicon area by 37% and improves energy efficiency by 68%. These results demonstrate a compelling trade-off among generality, energy efficiency, and hardware utilization for edge AI accelerators.
📝 Abstract
Leveraging high degrees of unstructured sparsity is a promising approach to enhance the efficiency of deep neural network DNN accelerators - particularly important for emerging Edge-AI applications. We introduce VUSA, a systolic-array architecture that virtually grows based on the present sparsity to perform larger matrix multiplications with the same number of physical multiply-accumulate MAC units. The proposed architecture achieves saving by 37% and 68% in area and power efficiency, respectively, at the same peak-performance, compared to a baseline systolic array architecture in a commercial 16-nm technology. Still, the proposed architecture supports acceleration for any DNN with any sparsity - even no sparsity at all. Thus, the proposed architecture is application-independent, making it viable for general-purpose AI acceleration.