Enabling Efficient Hardware Acceleration of Hybrid Vision Transformer (ViT) Networks at the Edge

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Hybrid Vision Transformers (Hybrid ViTs) suffer from low hardware acceleration efficiency on resource-constrained edge devices due to heterogeneous layer structures and large intermediate tensors. To address this, we propose a cross-stack hardware-software co-optimization methodology: (1) a reconfigurable processing element (PE) array natively supporting normalization and Softmax operations; (2) temporal loop reordering and cross-layer fusion of inverted-bottleneck blocks to minimize off-chip data movement. Implemented in 28 nm CMOS, the accelerator achieves 1.39 TOPS/W peak energy efficiency and 25.6 GMACs/s throughput—substantially outperforming baseline accelerators. Our key contribution is the first unified framework integrating configurable compute architecture, hardware-friendly operator fusion, and dynamic loop scheduling for Hybrid ViT deployment on edge devices—establishing a novel paradigm for efficient hardware mapping of heterogeneous ViT models.

Technology Category

Application Category

📝 Abstract

Hybrid vision transformers combine the elements of conventional neural networks (NN) and vision transformers (ViT) to enable lightweight and accurate detection. However, several challenges remain for their efficient deployment on resource-constrained edge devices. The hybrid models suffer from a widely diverse set of NN layer types and large intermediate data tensors, hampering efficient hardware acceleration. To enable their execution at the edge, this paper proposes innovations across the hardware-scheduling stack: a.) At the lowest level, a configurable PE array supports all hybrid ViT layer types; b.) temporal loop re-ordering within one layer, enabling hardware support for normalization and softmax layers, minimizing on-chip data transfers; c.) further scheduling optimization employs layer fusion across inverted bottleneck layers to drastically reduce off-chip memory transfers. The resulting accelerator is implemented in 28nm CMOS, achieving a peak energy efficiency of 1.39 TOPS/W at 25.6 GMACs/s.

Problem

Research questions and friction points this paper is trying to address.

Efficient hardware acceleration for hybrid ViT networks at edge

Diverse NN layer types hinder edge device deployment

Large intermediate data tensors limit hardware efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Configurable PE array supports hybrid ViT layers

Temporal loop re-ordering minimizes on-chip transfers

Layer fusion reduces off-chip memory transfers

🔎 Similar Papers

No similar papers found.