AMP4EC: Adaptive Model Partitioning Framework for Efficient Deep Learning Inference in Edge Computing Environments

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inference efficiency bottleneck of deep learning models under resource-constrained edge computing environments, this paper proposes a dynamic, adaptive model partitioning framework. The framework performs real-time resource-aware partitioning at fine-grained layer granularity and enables runtime reconfiguration, overcoming limitations of conventional static partitioning and fixed deployment strategies. It integrates lightweight resource monitoring, latency-aware partitioning decision-making, and end-edge collaborative scheduling, and achieves cross-platform compatibility via ONNX and TensorRT. Experimental evaluation on edge devices—including Raspberry Pi and Jetson Nano—demonstrates up to 78% reduction in end-side inference latency and a 414% increase in throughput, significantly outperforming baseline approaches. These results validate both the effectiveness and generalizability of the proposed method across heterogeneous edge platforms.

Technology Category

Application Category

📝 Abstract
Edge computing enables efficient deep learning inference in resource-constrained environments. In this paper, we propose AMP4EC, an adaptive model partitioning framework that optimizes inference by dynamically partitioning deep learning models based on real-time resource availability. Our approach achieves a latency reduction of up to 78% and a throughput improvement of 414% compared to baseline methods.
Problem

Research questions and friction points this paper is trying to address.

Optimizes deep learning inference in edge computing
Dynamically partitions models for resource efficiency
Reduces latency and improves throughput significantly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive partitioning of deep learning models
Dynamic optimization based on resource availability
Significant latency reduction and throughput improvement
🔎 Similar Papers
No similar papers found.
G
Guilin Zhang
Department of Engineering Management and Systems Engineering, George Washington University, USA
W
Wulan Guo
Department of Engineering Management and Systems Engineering, George Washington University, USA
Z
Ziqi Tan
Department of Engineering Management and Systems Engineering, George Washington University, USA
Hailong Jiang
Hailong Jiang
Computer Science, Youngstown State University
Fault tolerantHPC systemCompilerCode Intelligence