🤖 AI Summary
To address the high computational and memory overhead, nonlinear operator bottlenecks, and suboptimal hardware resource utilization of Vision Transformers (ViTs) on FPGAs, this paper proposes a software-hardware co-designed automated compilation optimization framework. Our method introduces four key innovations: (1) a novel multi-core memory bandwidth coordination architecture; (2) a low-error approximation technique for nonlinear functions; (3) a logic-resource-aware compiler design; and (4) a design-space exploration algorithm integrating DDR multi-bank parallel memory access with automated hardware configuration search. Evaluated on DeiT-S and DeiT-B models, our framework achieves 1.5× and 1.42× higher throughput, respectively, compared to the state-of-the-art ViT accelerators on FPGAs. It also significantly improves energy efficiency and FPGA resource utilization. This work delivers a system-level solution for efficient ViT inference on edge FPGA platforms.
📝 Abstract
Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision. Unlike traditional approaches, ViTs employ the self-attention mechanism, which has been widely used in natural language processing, to analyze image patches. Despite their advantages in modeling visual tasks, deploying ViTs on hardware platforms, notably Field-Programmable Gate Arrays (FPGAs), introduces considerable challenges. These challenges stem primarily from the non-linear calculations and high computational and memory demands of ViTs. This paper introduces CHOSEN, a software-hardware co-design framework to address these challenges and offer an automated framework for ViT deployment on the FPGAs in order to maximize performance. Our framework is built upon three fundamental contributions: multi-kernel design to maximize the bandwidth, mainly targeting benefits of multi DDR memory banks, approximate non-linear functions that exhibit minimal accuracy degradation, and efficient use of available logic blocks on the FPGA, and efficient compiler to maximize the performance and memory-efficiency of the computing kernels by presenting a novel algorithm for design space exploration to find optimal hardware configuration that achieves optimal throughput and latency. Compared to the state-of-the-art ViT accelerators, CHOSEN achieves a 1.5x and 1.42x improvement in the throughput on the DeiT-S and DeiT-B models.