🤖 AI Summary
Existing methods struggle to simultaneously achieve high inference accuracy and end-to-end real-time performance on batteryless, intermittently powered ultra-low-resource edge devices (e.g., SRAM < 256 KB).
Method: We propose the first holistic framework jointly optimizing neural network lightweighting and intermittent execution scheduling. It integrates weight sharing, structured pruning, intermittent-aware neural architecture search (iNAS), dynamic task partitioning, and energy-aware scheduling. Unlike conventional NAS approaches—designed only for continuous execution or trading accuracy for robustness—our method co-designs model architecture and execution schedule to inherently accommodate power interruptions.
Contribution/Results: Evaluated on MCU-class batteryless hardware, our framework significantly outperforms iNAS and HW-NAS baselines in accuracy while meeting strict end-to-end latency constraints. It achieves, for the first time, high-accuracy, low-latency deep inference deployment under intermittent execution at the edge.
📝 Abstract
Emerging research in edge devices and micro-controller units (MCU) enables on-device computation of Deep Learning Training and Inferencing tasks. More recently, contemporary trends focus on making the Deep Neural Net (DNN) Models runnable on battery-less intermittent devices. One of the approaches is to shrink the DNN models by enabling weight sharing, pruning, and conducted Neural Architecture Search (NAS) with optimized search space to target specific edge devices cite{Cai2019OnceFA} cite{Lin2020MCUNetTD} cite{Lin2021MCUNetV2MP} cite{Lin2022OnDeviceTU}. Another approach analyzes the intermittent execution and designs the corresponding system by performing NAS that is aware of intermittent execution cycles and resource constraints cite{iNAS} cite{HW-NAS} cite{iLearn}. However, the optimized NAS was only considering consecutive execution with no power loss, and intermittent execution designs only focused on balancing data reuse and costs related to intermittent inference and often with low accuracy. We proposed Accelerated Intermittent Deep Inference to harness the power of optimized inferencing DNN models specifically targeting SRAM under 256KB and make it schedulable and runnable within intermittent power. Our main contribution is: (1) Schedule tasks performed by on-device inferencing into intermittent execution cycles and optimize for latency; (2) Develop a system that can satisfy the end-to-end latency while achieving a much higher accuracy compared to baseline cite{iNAS} cite{HW-NAS}