🤖 AI Summary
This work addresses the dual bottlenecks of severe memory constraints and computational redundancy in deploying lightweight neural networks for continuous inference on battery-powered, always-on microcontrollers with only 128 KB RAM. We propose the first inter-layer dataflow co-optimization framework tailored for temporal sensor streams. Our approach jointly integrates operator-level memory reuse, incremental state caching, and sliding-window-aware computation pruning to eliminate redundant operations and minimize RAM footprint simultaneously under streaming sliding-window inputs. The framework supports plug-and-play deployment of multiple models without recompilation. Evaluated on real MCU hardware, it reduces peak RAM usage by over 60% and cuts redundant computations by up to 90%. The implementation is fully open-sourced, ensuring end-to-end reproducibility.
📝 Abstract
Always-on sensors are increasingly expected to embark a variety of tiny neural networks and to continuously perform inference on time-series of the data they sense. In order to fit lifetime and energy consumption requirements when operating on battery, such hardware uses microcontrollers (MCUs) with tiny memory budget e.g., 128kB of RAM. In this context, optimizing data flows across neural network layers becomes crucial. In this paper, we introduce TinyDéjàVu, a new framework and novel algorithms we designed to drastically reduce the RAM footprint required by inference using various tiny ML models for sensor data time-series on typical microcontroller hardware. We publish the implementation of TinyDéjàVu as open source, and we perform reproducible benchmarks on hardware. We show that TinyDéjàVu can save more than 60% of RAM usage and eliminate up to 90% of redundant compute on overlapping sliding window inputs.