A Survey of Early Exit Deep Neural Networks in NLP

📅 2025-01-13

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

To address the high inference overhead and heterogeneous sample difficulty of large language models on resource-constrained devices, this paper systematically investigates early exiting mechanisms in NLP and proposes the first comprehensive taxonomy and critical survey framework for NLP-oriented early exiting. The framework encompasses confidence-thresholding, gating networks, reinforcement learning–driven, and knowledge distillation–enhanced multi-exit architectures, along with collaborative training paradigms. Key contributions are: (1) uncovering synergistic gains across inference acceleration, adversarial robustness improvement, and energy reduction; (2) establishing a unified evaluation benchmark to quantitatively assess existing methods’ gaps in generalizability, theoretical interpretability, and deployment feasibility; and (3) providing a systematic development roadmap and practical guidelines for efficient, adaptive NLP models.

Technology Category

Application Category

📝 Abstract

Deep Neural Networks (DNNs) have grown increasingly large in size to achieve state of the art performance across a wide range of tasks. However, their high computational requirements make them less suitable for resource-constrained applications. Also, real-world datasets often consist of a mixture of easy and complex samples, necessitating adaptive inference mechanisms that account for sample difficulty. Early exit strategies offer a promising solution by enabling adaptive inference, where simpler samples are classified using the initial layers of the DNN, thereby accelerating the overall inference process. By attaching classifiers at different layers, early exit methods not only reduce inference latency but also improve the model robustness against adversarial attacks. This paper presents a comprehensive survey of early exit methods and their applications in NLP.

Problem

Research questions and friction points this paper is trying to address.

Adaptive computation

Deep Neural Networks (DNN)

Resource-constrained devices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Early Exit Strategy

Dynamic Adjustment

Computational Efficiency

🔎 Similar Papers

EEG-Defender: Defending against Jailbreak through Early Exit Generation of Large Language Models