NeCTAr and RASoC: Tale of Two Class SoCs for Language Model Interference and Robotics in Intel 16

📅 2024-08-25
🏛️ IEEE Hot Chips Symposium
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional RISC-V SoCs face significant energy-efficiency bottlenecks in accelerating sparse and dense machine learning kernels—especially Transformer-based language models—due to memory bandwidth limitations and suboptimal data movement. Method: This work introduces NeCTAr, a heterogeneous multi-core RISC-V SoC fabricated in Intel’s 16 nm process, featuring near-core and near-memory acceleration tailored for sparse and dense ML workloads. It pioneers a cache-near Transformer-specific datapath, integrates sparse tensor computation optimizations, employs RISC-V multi-core cooperative scheduling, and incorporates circuit-level customizations at the process level. Contribution/Results: NeCTAr achieves end-to-end hardware inference of the ReLU-Llama sparse language model. The prototype operates at 400 MHz and 0.85 V, delivering 109 GOPs/W for matrix-vector multiplication—marking a substantial improvement in joint optimization of sparse computation efficiency and overall system energy efficiency.

Technology Category

Application Category

📝 Abstract
This paper introduces NeCTAr (Near-Cache Transformer Accelerator), a 16nm heterogeneous multicore RISC-V SoC for sparse and dense machine learning kernels with both near-core and near-memory accelerators. A prototype chip runs at 400MHz at 0.85V and performs matrix-vector multiplications with 109 GOPs/W. The effectiveness of the design is demonstrated by running inference on a sparse language model, ReLU-Llama.
Problem

Research questions and friction points this paper is trying to address.

Develops a RISC-V SoC for efficient language model inference.
Integrates near-core and near-memory accelerators for ML kernels.
Demonstrates performance with sparse language model ReLU-Llama.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous RISC-V SoC for ML kernels
Near-core and near-memory accelerators integration
Efficient matrix-vector multiplications at 109 GOPs/W
🔎 Similar Papers
V
Viansa Schmulbach
University of California, Berkeley, CA, USA
Jason Kim
Jason Kim
PhD Candidate, Georgia Institute of Technology
Information SecurityHardware Security
E
Ethan Gao
University of California, Berkeley, CA, USA
L
Lucy Revina
University of California, Berkeley, CA, USA
N
Nikhil Jha
University of California, Berkeley, CA, USA
E
Ethan Wu
University of California, Berkeley, CA, USA
Borivoje Nikolic
Borivoje Nikolic
University of California Berkeley
Integrated circuitsVLSIComputer ArchitectureCommunicationsSignal Processing