🤖 AI Summary
Efficient deployment of recurrent neural networks (RNNs), particularly LSTMs, on FPGA platforms remains challenging, as mainstream AI compilers—e.g., FINN—primarily support feedforward architectures, and LSTM acceleration typically requires fully manual hardware design.
Method: This work proposes a generic, end-to-end deployment flow leveraging the ONNX Scan operator to uniformly model recurrent structures. It integrates mixed-precision quantization, a customized FINN conversion pipeline, and Vitis HLS-based IP generation to automate the mapping from quantized models to synthesizable hardware IPs.
Contribution/Results: We present the first systematic support for mixed-precision LSTM deployment in FINN, significantly improving resource efficiency and design scalability. Our approach is validated on an XCZU7EV FPGA with a ConvLSTM accelerator for stock price prediction, achieving low latency and reduced resource consumption while matching or exceeding the inference accuracy of high-precision floating-point baselines.
📝 Abstract
Recurrent neural networks (RNNs), particularly LSTMs, are effective for time-series tasks like sentiment analysis and short-term stock prediction. However, their computational complexity poses challenges for real-time deployment in resource constrained environments. While FPGAs offer a promising platform for energy-efficient AI acceleration, existing tools mainly target feed-forward networks, and LSTM acceleration typically requires full custom implementation. In this paper, we address this gap by leveraging the open-source and extensible FINN framework to enable the generalized deployment of LSTMs on FPGAs. Specifically, we leverage the Scan operator from the Open Neural Network Exchange (ONNX) specification to model the recurrent nature of LSTM computations, enabling support for mixed quantisation within them and functional verification of LSTM-based models. Furthermore, we introduce custom transformations within the FINN compiler to map the quantised ONNX computation graph to hardware blocks from the HLS kernel library of the FINN compiler and Vitis HLS. We validate the proposed tool-flow by training a quantised ConvLSTM model for a mid-price stock prediction task using the widely used dataset and generating a corresponding hardware IP of the model using our flow, targeting the XCZU7EV device. We show that the generated quantised ConvLSTM accelerator through our flow achieves a balance between performance (latency) and resource consumption, while matching (or bettering) inference accuracy of state-of-the-art models with reduced precision. We believe that the generalisable nature of the proposed flow will pave the way for resource-efficient RNN accelerator designs on FPGAs.