EfficientXpert: Efficient Domain Adaptation for Large Language Models via Propagation-Aware Pruning

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Large language models (LLMs) face challenges in domain-specific deployment—particularly in healthcare and law—due to excessive parameter counts and the poor cross-domain generalization and high computational overhead of existing pruning methods. Method: This paper proposes an efficient domain-adaptive pruning framework that integrates a propagation-aware pruning criterion (Foresight Mask) with a partial-parameter surgical optimization algorithm (Partial Brain Surgeon), enabling one-shot conversion of general pretrained LLMs into sparse, domain-specialized expert models. Leveraging LoRA-based fine-tuning, the framework achieves lightweight compression and domain adaptation simultaneously. Contribution/Results: Experiments demonstrate that, at 40% sparsity, the pruned models retain 98% of the original performance on healthcare and legal benchmarks—surpassing state-of-the-art pruning approaches. Crucially, this work is the first to empirically reveal the critical impact of domain-specific structural changes on pruning efficacy.

Technology Category

Application Category

📝 Abstract

The rapid advancement of large language models (LLMs) has increased the demand for domain-specialized variants in areas such as law, healthcare, and finance. However, their large size remains a barrier to deployment in resource-constrained environments, and existing compression methods either generalize poorly across domains or incur high overhead. In this work, we propose extbf{EfficientXpert}, a lightweight domain-pruning framework that combines a propagation-aware pruning criterion (Foresight Mask) with an efficient adapter-update algorithm (Partial Brain Surgeon). Integrated into the LoRA fine-tuning process, EfficientXpert enables a one-step transformation of general pretrained models into sparse, domain-adapted experts. Across health and legal tasks, it retains up to 98% of dense-model performance at 40% sparsity, outperforming state-of-the-art methods. Further analysis reveals substantial domain-dependent structural shifts that degrade the effectiveness of general pruning masks, underscoring the need for adaptive, domain-aware pruning strategies tailored to each domain.

Problem

Research questions and friction points this paper is trying to address.

Addresses large LLM size barriers in resource-constrained domain deployments

Solves poor cross-domain generalization in existing compression methods

Overcomes domain-dependent structural shifts degrading general pruning effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses propagation-aware pruning for domain adaptation

Integrates pruning with LoRA fine-tuning process

Achieves high sparsity while retaining dense performance

🔎 Similar Papers

No similar papers found.