Environment-Aware Dynamic Pruning for Pipelined Edge Inference

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Edge inference systems face poor model adaptability and high reconfiguration overhead due to resource constraints and dynamic environmental changes. To address this, we propose a runtime environment-aware dynamic pruning framework. Our approach features: (1) a novel post-deployment robust pruning–aware training strategy, enabling models to sustain accuracy under varying pruning configurations after deployment; and (2) a node-level adaptive pruning decision algorithm guided by real-time bottleneck monitoring, supporting “prune-on-demand” model slicing and load-aware balancing within distributed inference pipelines. This framework overcomes the limitations of conventional offline static pruning and costly runtime model reconfiguration. Evaluated on a Raspberry Pi 4B cluster, our method achieves a 1.5× improvement in inference throughput and a 3× increase in SLO compliance rate, while preserving model accuracy with no statistically significant degradation.

Technology Category

Application Category

📝 Abstract
IoT and edge-based inference systems require unique solutions to overcome resource limitations and unpredictable environments. In this paper, we propose an environment-aware dynamic pruning system that handles the unpredictability of edge inference pipelines. While traditional pruning approaches can reduce model footprint and compute requirements, they are often performed only once, offline, and are not designed to react to transient or post-deployment device conditions. Similarly, existing pipeline placement strategies may incur high overhead if reconfigured at runtime, limiting their responsiveness. Our approach allows slices of a model, already placed on a distributed pipeline, to be ad-hoc pruned as a means of load-balancing. To support this capability, we introduce two key components: (1) novel training strategies that endow models with robustness to post-deployment pruning, and (2) an adaptive algorithm that determines the optimal pruning level for each node based on monitored bottlenecks. In real-world experiments on a Raspberry Pi 4B cluster running camera-trap workloads, our method achieves a 1.5x speedup and a 3x improvement in service-level objective (SLO) attainment, all while maintaining high accuracy.
Problem

Research questions and friction points this paper is trying to address.

Dynamic pruning for edge inference pipelines
Adaptive pruning based on runtime conditions
Improving SLO attainment and speedup in IoT
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic pruning adapts to edge environment changes.
Training ensures model robustness post-deployment pruning.
Adaptive algorithm optimizes pruning for each node.
🔎 Similar Papers
No similar papers found.