TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current TPU design faces bottlenecks including heavy reliance on expert knowledge, high manual labor costs, and scarcity of domain-specific training data. To address these challenges, this paper proposes the first LLM-based automated TPU generation framework, specifically targeting systolic array architectures and enabling end-to-end hardware synthesis—from high-level specifications to synthesizable RTL. Key contributions are: (1) the first open-source, high-quality hardware-specific dataset for training and evaluation; (2) a hardware-semantic-aware RAG mechanism that substantially mitigates LLM hallucination; and (3) integrated modeling of systolic arrays, optimized approximate multiply-accumulate units, and automated hardware pipelining. Experimental results demonstrate that the generated TPUs achieve, on average, 92% reduction in area and 96% reduction in power consumption compared to manually optimized baselines, with significant improvements in energy efficiency and overall PPA (performance–power–area).

Technology Category

Application Category

📝 Abstract
The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficiency requirements. Nevertheless, designing optimal TPU remains challenging due to the high domain expertise level, considerable manual design time, and lack of high-quality, domain-specific datasets. This paper introduces TPU-Gen, the first Large Language Model (LLM) based framework designed to automate the exact and approximate TPU generation process, focusing on systolic array architectures. TPU-Gen is supported with a meticulously curated, comprehensive, and open-source dataset that covers a wide range of spatial array designs and approximate multiply-and-accumulate units, enabling design reuse, adaptation, and customization for different DNN workloads. The proposed framework leverages Retrieval-Augmented Generation (RAG) as an effective solution for a data-scare hardware domain in building LLMs, addressing the most intriguing issue, hallucinations. TPU-Gen transforms high-level architectural specifications into optimized low-level implementations through an effective hardware generation pipeline. Our extensive experimental evaluations demonstrate superior performance, power, and area efficiency, with an average reduction in area and power of 92% and 96% from the manual optimization reference values. These results set new standards for driving advancements in next-generation design automation tools powered by LLMs.
Problem

Research questions and friction points this paper is trying to address.

Automates TPU generation using LLMs for DNN workloads
Addresses lack of domain-specific datasets in hardware design
Reduces manual design time and expertise for TPU optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven TPU generation framework
Open-source dataset for TPU designs
RAG-enhanced hardware generation pipeline
🔎 Similar Papers
No similar papers found.
D
Deepak Vungarala
New Jersey Institute of Technology, Newark, NJ, USA
M
Mohammed E. Elbtity
University of South Carolina, Columbia, SC, USA
S
Sumiya Syed
New Jersey Institute of Technology, Newark, NJ, USA
S
Sakila Alam
New Jersey Institute of Technology, Newark, NJ, USA
K
Kartik Pandit
New Jersey Institute of Technology, Newark, NJ, USA
Arnob Ghosh
Arnob Ghosh
Assistant Professor of ECE at New Jersey Institute of Technology
Reinforcement LearningGame thoeryIntelligent Transportation SystemComputer Networks
Ramtin Zand
Ramtin Zand
Assistant Professor, University of South Carolina
Edge ComputingNeuromorphic ComputingIn-Memory ComputingMachine LearningProcessing-In-Memory
Shaahin Angizi
Shaahin Angizi
Assistant Professor at New Jersey Institute of Technology
In-Memory ComputingIn-Sensor ComputingMemory SecurityAIDigital Design