Enabling Efficient Hardware Acceleration of Hybrid Vision Transformer (ViT) Networks at the Edge

📅 2025-07-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Hybrid Vision Transformers (Hybrid ViTs) suffer from low hardware acceleration efficiency on resource-constrained edge devices due to heterogeneous layer structures and large intermediate tensors. To address this, we propose a cross-stack hardware-software co-optimization methodology: (1) a reconfigurable processing element (PE) array natively supporting normalization and Softmax operations; (2) temporal loop reordering and cross-layer fusion of inverted-bottleneck blocks to minimize off-chip data movement. Implemented in 28 nm CMOS, the accelerator achieves 1.39 TOPS/W peak energy efficiency and 25.6 GMACs/s throughput—substantially outperforming baseline accelerators. Our key contribution is the first unified framework integrating configurable compute architecture, hardware-friendly operator fusion, and dynamic loop scheduling for Hybrid ViT deployment on edge devices—establishing a novel paradigm for efficient hardware mapping of heterogeneous ViT models.

Technology Category

Application Category

📝 Abstract
Hybrid vision transformers combine the elements of conventional neural networks (NN) and vision transformers (ViT) to enable lightweight and accurate detection. However, several challenges remain for their efficient deployment on resource-constrained edge devices. The hybrid models suffer from a widely diverse set of NN layer types and large intermediate data tensors, hampering efficient hardware acceleration. To enable their execution at the edge, this paper proposes innovations across the hardware-scheduling stack: a.) At the lowest level, a configurable PE array supports all hybrid ViT layer types; b.) temporal loop re-ordering within one layer, enabling hardware support for normalization and softmax layers, minimizing on-chip data transfers; c.) further scheduling optimization employs layer fusion across inverted bottleneck layers to drastically reduce off-chip memory transfers. The resulting accelerator is implemented in 28nm CMOS, achieving a peak energy efficiency of 1.39 TOPS/W at 25.6 GMACs/s.
Problem

Research questions and friction points this paper is trying to address.

Efficient hardware acceleration for hybrid ViT networks at edge
Diverse NN layer types hinder edge device deployment
Large intermediate data tensors limit hardware efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Configurable PE array supports hybrid ViT layers
Temporal loop re-ordering minimizes on-chip transfers
Layer fusion reduces off-chip memory transfers
🔎 Similar Papers
No similar papers found.