🤖 AI Summary
To address the energy-efficiency–latency trade-off in deploying CNNs on resource-constrained devices (e.g., mobile and autonomous driving platforms), this paper proposes a co-optimization framework integrating layer skipping with dynamic voltage and frequency scaling (DVFS). We introduce proportional layer skipping (PLS), a novel mechanism that models the layer-skipping ratio as a continuous, tunable parameter—jointly optimized with CPU frequency. This enables fine-grained, hardware-aware inference acceleration. Further, we formulate a tri-objective optimization framework balancing inference latency, energy consumption, and model accuracy—overcoming the limitations of conventional single-dimension model compression. Evaluations on ResNet-152 over CIFAR-10 demonstrate that our method reduces energy consumption by 42.7% and computational operations by 38.5%, while maintaining bounded inference latency and incurring only a 0.9% accuracy drop relative to the baseline.
📝 Abstract
The energy consumption of Convolutional Neural Networks (CNNs) is a critical factor in deploying deep learning models on resource-limited equipment such as mobile devices and autonomous vehicles. We propose an approach involving Proportional Layer Skipping (PLS) and Frequency Scaling (FS). Layer skipping reduces computational complexity by selectively bypassing network layers, whereas frequency scaling adjusts the frequency of the processor to optimize energy use under latency constraints. Experiments of PLS and FS on ResNet-152 with the CIFAR-10 dataset demonstrated significant reductions in computational demands and energy consumption with minimal accuracy loss. This study offers practical solutions for improving real-time processing in resource-limited settings and provides insights into balancing computational efficiency and model performance.