🤖 AI Summary
Deploying large convolutional neural networks such as YOLO-NAS on FPGA accelerators is hindered by incomplete compilation toolchains and limited on-chip memory, making it challenging to meet the stringent requirements of safety-critical domains like aerospace. This work extends and fully automates the compilation pipeline of the Versatile Tensor Accelerator (VTA), introducing an integrated approach that combines memory optimization strategies with CNN operator mapping techniques. For the first time, it enables end-to-end certifiable compilation and embedded deployment of YOLO-NAS on VTA, with automatic handling of models exceeding on-chip memory capacity. Experimental results demonstrate the feasibility and effectiveness of the proposed framework in efficiently compiling, simulating, and deploying large CNNs in safety-critical systems.
📝 Abstract
Deploying complex Convolutional Neural Networks (CNNs) on FPGA-based accelerators is a promising way forward for safety-critical domains such as aeronautics. In a previous work, we have explored the Versatile Tensor Accelerator (VTA) and showed its suitability for avionic applications. For that, we developed an initial stand-alone compiler designed with certification in mind. However, this compiler still suffers from some limitations that are overcome in this paper. The contributions consist in extending and fully automating the VTA compilation chain to allow complete CNN compilation and support larger CNNs (which parameters do not fit in the on-chip memory). The effectiveness is demonstrated by the successful compilation and simulated execution of a YOLO-NAS object detection model.