EfficientXpert: Efficient Domain Adaptation for Large Language Models via Propagation-Aware Pruning

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face challenges in domain-specific deployment—particularly in healthcare and law—due to excessive parameter counts and the poor cross-domain generalization and high computational overhead of existing pruning methods. Method: This paper proposes an efficient domain-adaptive pruning framework that integrates a propagation-aware pruning criterion (Foresight Mask) with a partial-parameter surgical optimization algorithm (Partial Brain Surgeon), enabling one-shot conversion of general pretrained LLMs into sparse, domain-specialized expert models. Leveraging LoRA-based fine-tuning, the framework achieves lightweight compression and domain adaptation simultaneously. Contribution/Results: Experiments demonstrate that, at 40% sparsity, the pruned models retain 98% of the original performance on healthcare and legal benchmarks—surpassing state-of-the-art pruning approaches. Crucially, this work is the first to empirically reveal the critical impact of domain-specific structural changes on pruning efficacy.

Technology Category

Application Category

📝 Abstract
The rapid advancement of large language models (LLMs) has increased the demand for domain-specialized variants in areas such as law, healthcare, and finance. However, their large size remains a barrier to deployment in resource-constrained environments, and existing compression methods either generalize poorly across domains or incur high overhead. In this work, we propose extbf{EfficientXpert}, a lightweight domain-pruning framework that combines a propagation-aware pruning criterion (Foresight Mask) with an efficient adapter-update algorithm (Partial Brain Surgeon). Integrated into the LoRA fine-tuning process, EfficientXpert enables a one-step transformation of general pretrained models into sparse, domain-adapted experts. Across health and legal tasks, it retains up to 98% of dense-model performance at 40% sparsity, outperforming state-of-the-art methods. Further analysis reveals substantial domain-dependent structural shifts that degrade the effectiveness of general pruning masks, underscoring the need for adaptive, domain-aware pruning strategies tailored to each domain.
Problem

Research questions and friction points this paper is trying to address.

Addresses large LLM size barriers in resource-constrained domain deployments
Solves poor cross-domain generalization in existing compression methods
Overcomes domain-dependent structural shifts degrading general pruning effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses propagation-aware pruning for domain adaptation
Integrates pruning with LoRA fine-tuning process
Achieves high sparsity while retaining dense performance
🔎 Similar Papers
No similar papers found.
S
Songlin Zhao
University of California Berkeley
M
Michael Pitts
San Francisco State University
Zhuwei Qin
Zhuwei Qin
San Francisco State University
Deep Learning AccelerationEfficient Machine LearningEdge ComputingInterpretable Deep Learning