🤖 AI Summary
Large language models (LLMs) face challenges in domain-specific deployment—particularly in healthcare and law—due to excessive parameter counts and the poor cross-domain generalization and high computational overhead of existing pruning methods.
Method: This paper proposes an efficient domain-adaptive pruning framework that integrates a propagation-aware pruning criterion (Foresight Mask) with a partial-parameter surgical optimization algorithm (Partial Brain Surgeon), enabling one-shot conversion of general pretrained LLMs into sparse, domain-specialized expert models. Leveraging LoRA-based fine-tuning, the framework achieves lightweight compression and domain adaptation simultaneously.
Contribution/Results: Experiments demonstrate that, at 40% sparsity, the pruned models retain 98% of the original performance on healthcare and legal benchmarks—surpassing state-of-the-art pruning approaches. Crucially, this work is the first to empirically reveal the critical impact of domain-specific structural changes on pruning efficacy.
📝 Abstract
The rapid advancement of large language models (LLMs) has increased the demand for domain-specialized variants in areas such as law, healthcare, and finance. However, their large size remains a barrier to deployment in resource-constrained environments, and existing compression methods either generalize poorly across domains or incur high overhead. In this work, we propose extbf{EfficientXpert}, a lightweight domain-pruning framework that combines a propagation-aware pruning criterion (Foresight Mask) with an efficient adapter-update algorithm (Partial Brain Surgeon). Integrated into the LoRA fine-tuning process, EfficientXpert enables a one-step transformation of general pretrained models into sparse, domain-adapted experts. Across health and legal tasks, it retains up to 98% of dense-model performance at 40% sparsity, outperforming state-of-the-art methods. Further analysis reveals substantial domain-dependent structural shifts that degrade the effectiveness of general pruning masks, underscoring the need for adaptive, domain-aware pruning strategies tailored to each domain.