🤖 AI Summary
To address the challenges of streaming inference and dynamic scene adaptation on edge devices, this paper proposes a low-overhead, high-accuracy online fine-tuning framework. Conventional fine-tuning struggles to simultaneously achieve energy efficiency, real-time responsiveness, and inference accuracy. To overcome this, we introduce a novel inter-tuning (cross-iteration) and intra-tuning (within-iteration) co-optimization mechanism, integrated with lightweight techniques including computational graph pruning, gradient sparsification, memory reuse, and adaptive step-size scheduling. Experimental evaluation demonstrates that, compared to state-of-the-art instantaneous online learning baselines, our approach reduces average fine-tuning latency by 64%, cuts energy consumption by 52%, and improves inference accuracy by 1.75 percentage points—significantly enhancing the overall efficacy of online learning at the edge.
📝 Abstract
Emerging applications, such as robot-assisted eldercare and object recognition, generally employ deep learning neural networks (DNNs) and naturally require: i) handling streaming-in inference requests and ii) adapting to possible deployment scenario changes. Online model fine-tuning is widely adopted to satisfy these needs. However, an inappropriate fine-tuning scheme could involve significant energy consumption, making it challenging to deploy on edge devices. In this paper, we propose EdgeOL, an edge online learning framework that optimizes inference accuracy, fine-tuning execution time, and energy efficiency through both inter-tuning and intra-tuning optimizations. Experimental results show that, on average, EdgeOL reduces overall fine-tuning execution time by 64%, energy consumption by 52%, and improves average inference accuracy by 1.75% over the immediate online learning strategy