DATA: Decomposed Attention-based Task Adaptation for Rehearsal-Free Continual Learning

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address catastrophic forgetting in continual learning of large language models (LLMs), this paper proposes a data-free, decomposition-based attention task-adaptation framework. The method innovatively decouples task-specific and shared knowledge by introducing a collaborative high-rank and low-rank LoRA adapter mechanism. It further designs an attention-driven learnable weight generation module and a task-relevance-aware dynamic allocation strategy, augmented with stochastic parameter restoration during training to jointly enhance knowledge retention and model plasticity. Evaluated on three mainstream continual learning benchmarks, the approach achieves state-of-the-art performance, significantly mitigating forgetting while improving both rapid adaptation to new tasks and long-term knowledge preservation.

Technology Category

Application Category

📝 Abstract
Continual learning (CL) is essential for Large Language Models (LLMs) to adapt to evolving real-world demands, yet they are susceptible to catastrophic forgetting (CF). While traditional CF solutions rely on expensive data rehearsal, recent rehearsal-free methods employ model-based and regularization-based strategies to address this issue. However, these approaches often neglect the model's plasticity, which is crucial to achieving optimal performance on newly learned tasks. Consequently, a key challenge in CL is striking a balance between preserving plasticity and mitigating CF. To tackle this challenge, we propose the $ extbf{D}$ecomposed $ extbf{A}$ttention-based $ extbf{T}$ask $ extbf{A}$daptation (DATA), which explicitly decouples and learns both task-specific and task-shared knowledge using high-rank and low-rank task adapters (e.g., LoRAs). For new tasks, DATA dynamically adjusts the weights of adapters of different ranks based on their relevance and distinction from previous tasks, allowing the model to acquire new task-specific skills while effectively retaining previously learned knowledge. Specifically, we implement a decomposed component weighting strategy comprising learnable components that collectively generate attention-based weights, allowing the model to integrate and utilize diverse knowledge from each DATA. Extensive experiments on three widely used benchmarks demonstrate that our proposed method achieves state-of-the-art performance. Notably, our approach significantly enhances model plasticity and mitigates CF by extending learnable components and employing stochastic restoration during training iterations.
Problem

Research questions and friction points this paper is trying to address.

Balancing plasticity and catastrophic forgetting
Adapting large models without data rehearsal
Enhancing task-specific and shared knowledge learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposed Attention-based Task Adaptation
High-rank and low-rank task adapters
Learnable components weighting strategy
🔎 Similar Papers
No similar papers found.
Huanxuan Liao
Huanxuan Liao
Institute of Automation, Chinese Academy of Sciences
Natural Language ProcessingLarge Language ModelLong Context Modeling
S
Shizhu He
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Y
Yupu Hao
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
J
Jun Zhao
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
K
Kang Liu
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China