🤖 AI Summary
Monocular depth estimation (MDE) on ultra-low-power microcontroller units (MCUs) suffers severe accuracy degradation due to domain shift, hindering practical deployment on IoT edge devices.
Method: We propose a field-adaptive online learning framework tailored for resource-constrained edge platforms. It fuses multi-modal sensor data and introduces a memory-driven sparse parameter update mechanism (requiring only 1.2 MB RAM), enabling cloud-free on-chip fine-tuning via pseudo-labeled depth supervision. A lightweight μPyD-Net architecture is deployed on the GAP9 RISC-V AI accelerator, supporting end-to-end backpropagation-based fine-tuning.
Contribution/Results: The system autonomously labels 3,000 samples in 17.8 minutes, reducing RMSE from 4.9 m to 0.6 m while consuming ~300 mW. To our knowledge, this is the first demonstration of feasible online adaptive MDE on MCUs under extreme resource constraints (sub-2 MB RAM, no external memory/cloud), establishing a new paradigm for intelligent edge perception.
📝 Abstract
Monocular depth estimation (MDE) plays a crucial role in enabling spatially-aware applications in Ultra-low-power (ULP) Internet-of-Things (IoT) platforms. However, the limited number of parameters of Deep Neural Networks for the MDE task, designed for IoT nodes, results in severe accuracy drops when the sensor data observed in the field shifts significantly from the training dataset. To address this domain shift problem, we present a multi-modal On-Device Learning (ODL) technique, deployed on an IoT device integrating a Greenwaves GAP9 MicroController Unit (MCU), a 80 mW monocular camera and a 8 x 8 pixel depth sensor, consuming $approx$300mW. In its normal operation, this setup feeds a tiny 107 k-parameter $μ$PyD-Net model with monocular images for inference. The depth sensor, usually deactivated to minimize energy consumption, is only activated alongside the camera to collect pseudo-labels when the system is placed in a new environment. Then, the fine-tuning task is performed entirely on the MCU, using the new data. To optimize our backpropagation-based on-device training, we introduce a novel memory-driven sparse update scheme, which minimizes the fine-tuning memory to 1.2 MB, 2.2x less than a full update, while preserving accuracy (i.e., only 2% and 1.5% drops on the KITTI and NYUv2 datasets). Our in-field tests demonstrate, for the first time, that ODL for MDE can be performed in 17.8 minutes on the IoT node, reducing the root mean squared error from 4.9 to 0.6m with only 3 k self-labeled samples, collected in a real-life deployment scenario.