Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual bottlenecks of high energy consumption in large language model (LLM) inference and prohibitively expensive mask costs for application-specific integrated circuits (e.g., photomask set fabrication), this work proposes the Hardwired Neuron Language Processing Unit (HNLPU) architecture. Its core innovation is Metal-Embedding—a technique that physically embeds LLM weights directly into the 3D metal interconnect topology of a 5 nm process, achieving hardware-level weight fixation. This approach improves weight storage density by 15× and reduces non-recurring engineering (NRE) mask costs by 112×, significantly alleviating NRE economic constraints. The chip employs standardized photolithographic masks, ensuring both high integration density and manufacturability. Experimental evaluation demonstrates a throughput of 249,960 tokens/s and an energy efficiency of 36 tokens/J—over 1,000× higher than state-of-the-art GPUs—while reducing carbon footprint by 230× and improving overall cost-effectiveness by 8.57×.

Technology Category

Application Category

📝 Abstract
The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailored for LLM inference. To overcome the growing energy consumption of LLM inference systems, this paper proposes a Hardwired-Neurons Language Processing Unit (HNLPU), which physically hardwires LLM weight parameters into the computational fabric, achieving several orders of magnitude computational efficiency improvement by extreme specialization. However, a significant challenge still lies in the scale of modern LLMs. An ideal estimation on hardwiring gpt-oss 120 B requires fabricating at least 6 billion dollars of photomask sets, rendering the straightforward solution economically impractical. Addressing this challenge, we propose the novel Metal-Embedding methodology. Instead of embedding weights in a 2D grid of silicon device cells, Metal-Embedding embeds weight parameters into the 3D topology of metal wires. This brings two benefits: (1) a 15x increase in density, and (2) 60 out of 70 layers of photomasks are made homogeneous across chips, including all EUV photomasks. In total, Metal-Embedding reduced the photomask cost by 112x, bringing the Non-Recurring Engineering (NRE) cost of HNLPU into an economically viable range. Experimental results show that HNLPU achieved 249,960 tokens/s (5,555x/85x of GPU/WSE), 36 tokens/J (1,047x/283x of GPU/WSE), 13,232 mm2 total die area (29% inscribed rectangular area in a 300 mm wafer), $184M estimated NRE at 5 nm technology. Analysis shows that HNLPU achieved 8.57x cost-effectiveness and 230x carbon footprint reduction compared to H100 clusters, under an annual weight updating assumption.
Problem

Research questions and friction points this paper is trying to address.

Reducing energy consumption in LLM inference systems
Overcoming economic impracticality of hardwiring large language models
Improving computational efficiency through specialized hardware design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardwired-Neurons LPU physically embeds LLM weights
Metal-Embedding methodology uses 3D metal wire topology
Achieves 112x photomask cost reduction via homogeneous layers
🔎 Similar Papers
No similar papers found.
Y
Yang Liu
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Y
Yi Chen
University of Science and Technology of China, China
Yongwei Zhao
Yongwei Zhao
Institute of Computing Technology, Chinese Academy of Sciences
Computer Architecture
Y
Yifan Hao
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Z
Zifu Zheng
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Weihao Kong
Weihao Kong
Google
Z
Zhangmai Li
University of Science and Technology of China, China
D
Dongchen Jiang
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Ruiyang Xia
Ruiyang Xia
Xidian University
Deepfake detectionObject detectionImage steganography
Z
Zhihong Ma
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Z
Zisheng Liu
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Z
Zhaoyong Wan
University of Science and Technology of China, China
Y
Yunqi Lu
State Key Lab of Processors, Institute of Computing Technology, CAS, China
X
Ximing Liu
State Key Lab of Processors, Institute of Computing Technology, CAS, China
H
Hongrui Guo
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Z
Zhihao Yang
Institute of Software, CAS, China
Z
Zhe Wang
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Tianrui Ma
Tianrui Ma
Institute of Computing Technology, Chinese Academy of Sciences
visual computingAI hardwareEDAcomputer architecture
M
Mo Zou
State Key Lab of Processors, Institute of Computing Technology, CAS, China
R
Rui Zhang
State Key Lab of Processors, Institute of Computing Technology, CAS, China
L
Ling Li
Institute of Software, CAS, China
X
Xing Hu
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Z
Zidong Du
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Z
Zhiwei Xu
State Key Lab of Processors, Institute of Computing Technology, CAS, China
Q
Qi Guo
State Key Lab of Processors, Institute of Computing Technology, CAS, China