🤖 AI Summary
To address CPU resource imbalance and inflexible model selection under dynamic workloads in real-time AI applications on edge devices, this paper proposes a feedback-driven adaptive lightweight model switching framework leveraging multi-core CPU utilization. The method introduces: (1) a novel dynamic scheduling mechanism governed by real-time per-core CPU usage; (2) an ε-greedy online decision policy to ensure fair resource allocation and system resilience; and (3) integration of lightweight detection models—including YOLO-Nano and MobileNet-SSD—to achieve Pareto-optimal trade-offs between accuracy and inference efficiency. Evaluated in a traffic monitoring scenario, the framework reduces average CPU overload rate by 47%, decreases inference latency variance by 63%, and maintains detection accuracy above 92%.
📝 Abstract
The widespread adoption of machine learning on edge devices, such as mobile phones, laptops, IoT devices, etc., has enabled real-time AI applications in resource-constrained environments. Existing solutions for managing computational resources often focus narrowly on accuracy or energy efficiency, failing to adapt dynamically to varying workloads. Furthermore, the existing system lack robust mechanisms to adaptively balance CPU utilization, leading to inefficiencies in resource-constrained scenarios like real-time traffic monitoring. To address these limitations, we propose a self-adaptive approach that optimizes CPU utilization and resource management on edge devices. Our approach, EdgeMLBalancer balances between models through dynamic switching, guided by real-time CPU usage monitoring across processor cores. Tested on real-time traffic data, the approach adapts object detection models based on CPU usage, ensuring efficient resource utilization. The approach leverages epsilon-greedy strategy which promotes fairness and prevents resource starvation, maintaining system robustness. The results of our evaluation demonstrate significant improvements by balancing computational efficiency and accuracy, highlighting the approach's ability to adapt seamlessly to varying workloads. This work lays the groundwork for further advancements in self-adaptation for resource-constrained environments.