NeCTAr and RASoC: Tale of Two Class SoCs for Language Model Interference and Robotics in Intel 16

📅 2024-08-25

🏛️ IEEE Hot Chips Symposium

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Conventional RISC-V SoCs face significant energy-efficiency bottlenecks in accelerating sparse and dense machine learning kernels—especially Transformer-based language models—due to memory bandwidth limitations and suboptimal data movement. Method: This work introduces NeCTAr, a heterogeneous multi-core RISC-V SoC fabricated in Intel’s 16 nm process, featuring near-core and near-memory acceleration tailored for sparse and dense ML workloads. It pioneers a cache-near Transformer-specific datapath, integrates sparse tensor computation optimizations, employs RISC-V multi-core cooperative scheduling, and incorporates circuit-level customizations at the process level. Contribution/Results: NeCTAr achieves end-to-end hardware inference of the ReLU-Llama sparse language model. The prototype operates at 400 MHz and 0.85 V, delivering 109 GOPs/W for matrix-vector multiplication—marking a substantial improvement in joint optimization of sparse computation efficiency and overall system energy efficiency.

Technology Category

Application Category

📝 Abstract

This paper introduces NeCTAr (Near-Cache Transformer Accelerator), a 16nm heterogeneous multicore RISC-V SoC for sparse and dense machine learning kernels with both near-core and near-memory accelerators. A prototype chip runs at 400MHz at 0.85V and performs matrix-vector multiplications with 109 GOPs/W. The effectiveness of the design is demonstrated by running inference on a sparse language model, ReLU-Llama.

Problem

Research questions and friction points this paper is trying to address.

Develops a RISC-V SoC for efficient language model inference.

Integrates near-core and near-memory accelerators for ML kernels.

Demonstrates performance with sparse language model ReLU-Llama.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous RISC-V SoC for ML kernels

Near-core and near-memory accelerators integration

Efficient matrix-vector multiplications at 109 GOPs/W

🔎 Similar Papers

Software-Hardware Co-Design For Embodied AI Robots

2024-07-05Citations: 0

Authors to Follow