🤖 AI Summary
To address the inference efficiency bottleneck of deep learning models under resource-constrained edge computing environments, this paper proposes a dynamic, adaptive model partitioning framework. The framework performs real-time resource-aware partitioning at fine-grained layer granularity and enables runtime reconfiguration, overcoming limitations of conventional static partitioning and fixed deployment strategies. It integrates lightweight resource monitoring, latency-aware partitioning decision-making, and end-edge collaborative scheduling, and achieves cross-platform compatibility via ONNX and TensorRT. Experimental evaluation on edge devices—including Raspberry Pi and Jetson Nano—demonstrates up to 78% reduction in end-side inference latency and a 414% increase in throughput, significantly outperforming baseline approaches. These results validate both the effectiveness and generalizability of the proposed method across heterogeneous edge platforms.
📝 Abstract
Edge computing enables efficient deep learning inference in resource-constrained environments. In this paper, we propose AMP4EC, an adaptive model partitioning framework that optimizes inference by dynamically partitioning deep learning models based on real-time resource availability. Our approach achieves a latency reduction of up to 78% and a throughput improvement of 414% compared to baseline methods.