🤖 AI Summary
Deploying deep neural networks on resource-constrained microcontroller units (MCUs) is challenging due to significant hardware heterogeneity and the inefficiency of conventional neural architecture search (NAS) methods, which are time-consuming and often disregard hardware constraints. To address this, this work proposes PrototypeNAS—a zero-shot, multi-objective NAS framework that decouples architecture design from training. By integrating a novel search space, an ensemble of zero-shot proxies, and hypervolume-based subset selection, PrototypeNAS rapidly co-optimizes network topology alongside pruning and quantization configurations without requiring any full training. Experiments demonstrate that the method generates compact DNN models tailored to off-the-shelf MCUs in just minutes across twelve datasets, achieving Pareto-optimal trade-offs between accuracy and computational efficiency while matching the performance of much larger models.
📝 Abstract
Enabling efficient deep neural network (DNN) inference on edge devices with different hardware constraints is a challenging task that typically requires DNN architectures to be specialized for each device separately. To avoid the huge manual effort, one can use neural architecture search (NAS). However, many existing NAS methods are resource-intensive and time-consuming because they require the training of many different DNNs from scratch. Furthermore, they do not take the resource constraints of the target system into account. To address these shortcomings, we propose PrototypeNAS, a zero-shot NAS method to accelerate and automate the selection, compression, and specialization of DNNs to different target microcontroller units (MCUs). We propose a novel three-step search method that decouples DNN design and specialization from DNN training for a given target platform. First, we present a novel search space that not only cuts out smaller DNNs from a single large architecture, but instead combines the structural optimization of multiple architecture types, as well as optimization of their pruning and quantization configurations. Second, we explore the use of an ensemble of zero-shot proxies during optimization instead of a single one. Third, we propose the use of Hypervolume subset selection to distill DNN architectures from the Pareto front of the multi-objective optimization that represent the most meaningful tradeoffs between accuracy and FLOPs. We evaluate the effectiveness of PrototypeNAS on 12 different datasets in three different tasks: image classification, time series classification, and object detection. Our results demonstrate that PrototypeNAS is able to identify DNN models within minutes that are small enough to be deployed on off-the-shelf MCUs and still achieve accuracies comparable to the performance of large DNN models.