🤖 AI Summary
Existing early-exit networks (EENs) suffer from inefficient exit design on resource-constrained devices, where energy efficiency and accuracy are difficult to balance. Method: We propose a hardware-aware multi-exit neural architecture search (NAS) framework that jointly models hardware constraints—e.g., multiply-accumulate operations (MACs)—into the search of exit branch structures. Our approach simultaneously optimizes exit placement, depth, layer types, and adaptive early-exit thresholds, supporting variable-depth and heterogeneous layer compositions while enabling dynamic multi-exit inference and co-optimization of thresholds. Results: Evaluated on CIFAR-10, CIFAR-100, and SVHN, our method achieves higher accuracy than state-of-the-art EENs at equal or lower average MACs, demonstrating the effectiveness and superiority of hardware-driven joint optimization of architecture and exit policy.
📝 Abstract
Early-exit networks are effective solutions for reducing the overall energy consumption and latency of deep learning models by adjusting computation based on the complexity of input data. By incorporating intermediate exit branches into the architecture, they provide less computation for simpler samples, which is particularly beneficial for resource-constrained devices where energy consumption is crucial. However, designing early-exit networks is a challenging and time-consuming process due to the need to balance efficiency and performance. Recent works have utilized Neural Architecture Search (NAS) to design more efficient early-exit networks, aiming to reduce average latency while improving model accuracy by determining the best positions and number of exit branches in the architecture. Another important factor affecting the efficiency and accuracy of early-exit networks is the depth and types of layers in the exit branches. In this paper, we use hardware-aware NAS to strengthen exit branches, considering both accuracy and efficiency during optimization. Our performance evaluation on the CIFAR-10, CIFAR-100, and SVHN datasets demonstrates that our proposed framework, which considers varying depths and layers for exit branches along with adaptive threshold tuning, designs early-exit networks that achieve higher accuracy with the same or lower average number of MACs compared to the state-of-the-art approaches.