🤖 AI Summary
This work addresses the challenge of executing convolutional neural networks on memory-constrained, dedicated accelerators while meeting real-time requirements. To this end, the authors propose a formal framework for modeling convolution offloading sequences, recasting the scheduling problem as a constrained optimization task. By integrating convolution decomposition techniques, the framework constructs precise models of resource usage and latency constraints. A Python-based simulation platform is employed to systematically evaluate diverse scheduling strategies, enabling predictable and efficient utilization of accelerator resources. The proposed approach significantly enhances the execution efficiency of convolution operations, simultaneously satisfying real-time constraints and optimizing overall scheduling performance.
📝 Abstract
Convolutional neural networks (CNNs) require a large number of multiply-accumulate (MAC) operations. To meet real-time constraints, they often need to be executed on specialized accelerators composed of an on-chip memory and a processing unit. However, the on-chip memory is often insufficient to store all the data required to compute a CNN layer. Thus, the computation must be performed in several offloading steps. We formalise such sequences of steps and apply our formalism to a state of the art decomposition of convolutions. In order to find optimal strategies in terms of duration, we encode the problem with a set of constraints. A Python-based simulator allows to analyse in-depth computed strategies.