CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

📅 2024-07-17
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational and memory overhead, nonlinear operator bottlenecks, and suboptimal hardware resource utilization of Vision Transformers (ViTs) on FPGAs, this paper proposes a software-hardware co-designed automated compilation optimization framework. Our method introduces four key innovations: (1) a novel multi-core memory bandwidth coordination architecture; (2) a low-error approximation technique for nonlinear functions; (3) a logic-resource-aware compiler design; and (4) a design-space exploration algorithm integrating DDR multi-bank parallel memory access with automated hardware configuration search. Evaluated on DeiT-S and DeiT-B models, our framework achieves 1.5× and 1.42× higher throughput, respectively, compared to the state-of-the-art ViT accelerators on FPGAs. It also significantly improves energy efficiency and FPGA resource utilization. This work delivers a system-level solution for efficient ViT inference on edge FPGA platforms.

Technology Category

Application Category

📝 Abstract
Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision. Unlike traditional approaches, ViTs employ the self-attention mechanism, which has been widely used in natural language processing, to analyze image patches. Despite their advantages in modeling visual tasks, deploying ViTs on hardware platforms, notably Field-Programmable Gate Arrays (FPGAs), introduces considerable challenges. These challenges stem primarily from the non-linear calculations and high computational and memory demands of ViTs. This paper introduces CHOSEN, a software-hardware co-design framework to address these challenges and offer an automated framework for ViT deployment on the FPGAs in order to maximize performance. Our framework is built upon three fundamental contributions: multi-kernel design to maximize the bandwidth, mainly targeting benefits of multi DDR memory banks, approximate non-linear functions that exhibit minimal accuracy degradation, and efficient use of available logic blocks on the FPGA, and efficient compiler to maximize the performance and memory-efficiency of the computing kernels by presenting a novel algorithm for design space exploration to find optimal hardware configuration that achieves optimal throughput and latency. Compared to the state-of-the-art ViT accelerators, CHOSEN achieves a 1.5x and 1.42x improvement in the throughput on the DeiT-S and DeiT-B models.
Problem

Research questions and friction points this paper is trying to address.

Optimizing Vision Transformer inference on FPGAs
Addressing high computational and memory demands
Automating hardware configuration for performance efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-kernel design for DDR memory optimization
Approximate non-linear functions with minimal accuracy loss
Efficient compiler for optimal hardware configuration
🔎 Similar Papers
No similar papers found.
Mohammad Erfan Sadeghi
Mohammad Erfan Sadeghi
University of Southern California
Machine LearningDeep Learning Acceleration
A
A. Fayyazi
University of Southern California, Los Angeles, California, USA
S
Suhas Somashekar
University of Southern California, Los Angeles, California, USA
M
M. Pedram
University of Southern California, Los Angeles, California, USA