Hemlet: A Heterogeneous Compute-in-Memory Chiplet Architecture for Vision Transformers with Group-Level Parallelism

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision Transformers (ViTs) deliver state-of-the-art accuracy in visual tasks but suffer from prohibitive computational and memory demands, hindering hardware deployment. Monolithic compute-in-memory (CIM) architectures are fundamentally limited by die size, while chiplet-based CIM systems—though scalable—face severe throughput bottlenecks due to high inter-chiplet network-on-package (NoP) communication overhead. To address this, we propose Hemlet: the first heterogeneous CIM chiplet architecture tailored for ViTs. Hemlet integrates analog CIM, digital CIM, and intermediate-data processing chiplets, and introduces a novel group-level parallelism mechanism that drastically reduces NoP traffic. This co-optimization enables flexible resource scaling and efficient inter-chiplet communication, overcoming the capacity limitations of monolithic CIM while preserving high energy efficiency. Experimental results demonstrate significant throughput improvement for large-scale ViT inference without compromising accuracy or energy efficiency.

Technology Category

Application Category

📝 Abstract
Vision Transformers (ViTs) have established new performance benchmarks in vision tasks such as image recognition and object detection. However, these advancements come with significant demands for memory and computational resources, presenting challenges for hardware deployment. Heterogeneous compute-in-memory (CIM) accelerators have emerged as a promising solution for enabling energy-efficient deployment of ViTs. Despite this potential, monolithic CIM-based designs face scalability issues due to the size limitations of a single chip. To address this challenge, emerging chiplet-based techniques offer a more scalable alternative. However, chiplet designs come with their own costs, as they introduce more expensive communication through the network-on-package (NoP) compared to the network-on-chip (NoC), which can hinder improvements in throughput. This work introduces Hemlet, a heterogeneous CIM chiplet system designed to accelerate ViT. Hemlet facilitates flexible resource scaling through the integration of heterogeneous analog CIM (ACIM), digital CIM (DCIM), and Intermediate Data Process (IDP) chiplets. To improve throughput while reducing communication ove
Problem

Research questions and friction points this paper is trying to address.

Addresses high memory and computational demands of Vision Transformers deployment
Solves monolithic CIM accelerator scalability limitations through chiplet integration
Mitigates network-on-package communication bottlenecks in chiplet-based ViT acceleration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous CIM chiplet system for ViT acceleration
Integrates analog and digital CIM with data processing chiplets
Enables flexible resource scaling through chiplet architecture
🔎 Similar Papers
No similar papers found.
C
Cong Wang
The Hong Kong University of Science and Technology (Guangzhou)
Z
Zexin Fu
The Hong Kong University of Science and Technology (Guangzhou)
J
Jiayi Huang
The Hong Kong University of Science and Technology (Guangzhou)
Shanshi Huang
Shanshi Huang
HKUST-GZ