SAP: Syntactic Attention Pruning for Transformer-based Language Models

πŸ“… 2025-12-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the dual challenges of performance degradation and limited interpretability in attention head pruning for Transformer models. We propose a syntax-attention co-guided, retraining-free pruning method. Our core contribution is the first joint modeling of syntactic dependency structures and multi-layer attention distributions, enabling a contribution-based candidate head filtering (CF) mechanism that precisely identifies and retains high-density, functionally critical attention headsβ€”without requiring fine-tuning. Experiments across multiple benchmark tasks demonstrate that our method significantly outperforms existing retraining-free pruning approaches: it improves key head retention by 12.6–28.3%, reduces accuracy degradation by 37–51%, and exhibits strong generalization and enhanced behavioral interpretability.

Technology Category

Application Category

πŸ“ Abstract
This paper introduces Syntactic Attention Pruning (SAP), a novel method for effectively pruning attention heads in Transformer models. Unlike conventional approaches that rely solely on mathematical analysis of model weights and activations, SAP incorporates both the syntactic structure and attention patterns of sentences to guide the pruning process. By leveraging these linguistic features, SAP not only achieves performance comparable to state-of-the-art methods but also enhances the interpretability of model behavior. To further improve robustness, we propose Candidate Filtering (CF), a mechanism that prioritizes heads based on their contribution to model performance, mitigating degradation during pruning. Experimental results indicate that SAP effectively preserves critical heads of a high density of strong attention values, outperforming existing head pruning strategies in retrain-free settings. These findings position SAP as a promising foundation for a new direction in model compression research, offering high flexibility for pruning across all transformer-based language models.
Problem

Research questions and friction points this paper is trying to address.

Prunes attention heads in Transformer models using syntactic structure
Enhances interpretability and performance comparable to state-of-the-art methods
Improves robustness with Candidate Filtering to mitigate pruning degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

SAP prunes attention heads using syntactic structure
Candidate Filtering prioritizes heads by performance contribution
Method preserves critical heads with strong attention values
πŸ”Ž Similar Papers
No similar papers found.