🤖 AI Summary
Conventional RISC-V SoCs face significant energy-efficiency bottlenecks in accelerating sparse and dense machine learning kernels—especially Transformer-based language models—due to memory bandwidth limitations and suboptimal data movement.
Method: This work introduces NeCTAr, a heterogeneous multi-core RISC-V SoC fabricated in Intel’s 16 nm process, featuring near-core and near-memory acceleration tailored for sparse and dense ML workloads. It pioneers a cache-near Transformer-specific datapath, integrates sparse tensor computation optimizations, employs RISC-V multi-core cooperative scheduling, and incorporates circuit-level customizations at the process level.
Contribution/Results: NeCTAr achieves end-to-end hardware inference of the ReLU-Llama sparse language model. The prototype operates at 400 MHz and 0.85 V, delivering 109 GOPs/W for matrix-vector multiplication—marking a substantial improvement in joint optimization of sparse computation efficiency and overall system energy efficiency.
📝 Abstract
This paper introduces NeCTAr (Near-Cache Transformer Accelerator), a 16nm heterogeneous multicore RISC-V SoC for sparse and dense machine learning kernels with both near-core and near-memory accelerators. A prototype chip runs at 400MHz at 0.85V and performs matrix-vector multiplications with 109 GOPs/W. The effectiveness of the design is demonstrated by running inference on a sparse language model, ReLU-Llama.