How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization

📅 2025-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address catastrophic forgetting during fine-tuning of large language models (LLMs), this paper proposes a novel method that jointly preserves general-purpose capabilities and enables domain adaptation. The method introduces two key innovations: (1) element-wise parameter importance estimation to precisely identify parameters encoding critical general knowledge; and (2) a layer-adaptive regularization weighting scheme, integrating layer-aware coefficient scheduling with importance-sensitive regularization for dynamic trade-off control. The approach employs dual-objective optimization—combining regularization loss and cross-entropy loss—and is validated on GPT-J and LLaMA-3. Experiments across scientific, medical, and physics domains demonstrate significant mitigation of forgetting and improved generalization. Moreover, it achieves a 20× speedup in training and reduces parameter storage overhead to only 10%–15%. This work establishes a new paradigm for efficient, sustainable domain adaptation of LLMs.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) exhibit strong general-purpose language capabilities. However, fine-tuning these models on domain-specific tasks often leads to catastrophic forgetting, where the model overwrites or loses essential knowledge acquired during pretraining. This phenomenon significantly limits the broader applicability of LLMs. To address this challenge, we propose a novel approach to compute the element-wise importance of model parameters crucial for preserving general knowledge during fine-tuning. Our method utilizes a dual-objective optimization strategy: (1) regularization loss to retain the parameter crucial for general knowledge; (2) cross-entropy loss to adapt to domain-specific tasks. Additionally, we introduce layer-wise coefficients to account for the varying contributions of different layers, dynamically balancing the dual-objective optimization. Extensive experiments on scientific, medical, and physical tasks using GPT-J and LLaMA-3 demonstrate that our approach mitigates catastrophic forgetting while enhancing model adaptability. Compared to previous methods, our solution is approximately 20 times faster and requires only 10%-15% of the storage, highlighting the practical efficiency. The code will be released.
Problem

Research questions and friction points this paper is trying to address.

Catastrophic Forgetting
Domain-specific Knowledge
General Knowledge Retention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain Adaptation
Catastrophic Forgetting
Parameter Protection
🔎 Similar Papers
No similar papers found.
Shezheng Song
Shezheng Song
NUDT
H
Hao Xu
NUDT
J
Jun Ma
NUDT
S
Shasha Li
NUDT
Long Peng
Long Peng
China Electric Power Research Institute
LCC-HVDC and VSC-HVDC Transmission Technologies
Q
Qian Wan
CCNU
X
Xiaodong Liu
NUDT
J
Jie Yu
NUDT