Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals

📅 2025-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inter-layer feature redundancy in deep neural networks—where adjacent layers exhibit highly similar attention weights, degrading representational capacity and training efficiency—this paper proposes a differentiable, quantifiable layer-wise sparsity training paradigm. We introduce a novel KL-divergence-based metric to quantify inter-layer redundancy and design an Enhanced Beta Quantile Mapping (EBQM) mechanism to enable stable, differentiable skipping of redundant layers. Building upon this, we propose the Efficient Layer Attention (ELA) architecture. Experiments on image classification and object detection benchmarks demonstrate that ELA improves model accuracy, reduces training time by 30%, and maintains numerical stability and generalization performance without compromising inference fidelity.

Technology Category

Application Category

📝 Abstract
Growing evidence suggests that layer attention mechanisms, which enhance interaction among layers in deep neural networks, have significantly advanced network architectures. However, existing layer attention methods suffer from redundancy, as attention weights learned by adjacent layers often become highly similar. This redundancy causes multiple layers to extract nearly identical features, reducing the model's representational capacity and increasing training time. To address this issue, we propose a novel approach to quantify redundancy by leveraging the Kullback-Leibler (KL) divergence between adjacent layers. Additionally, we introduce an Enhanced Beta Quantile Mapping (EBQM) method that accurately identifies and skips redundant layers, thereby maintaining model stability. Our proposed Efficient Layer Attention (ELA) architecture, improves both training efficiency and overall performance, achieving a 30% reduction in training time while enhancing performance in tasks such as image classification and object detection.
Problem

Research questions and friction points this paper is trying to address.

Reduces redundancy in layer attention mechanisms
Improves training efficiency and model performance
Introduces KL divergence and EBQM for layer pruning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantify redundancy using KL divergence
Introduce EBQM to skip redundant layers
ELA architecture reduces training time
🔎 Similar Papers
No similar papers found.
H
Hanze Li
Glasgow College, University of Electronic Science and Technology of China
Xiande Huang
Xiande Huang
DAIL Tech
Artificial IntelligenceMachine LearningTrustworthy AIMedical AgentsReinforcement Learning