🤖 AI Summary
To address catastrophic forgetting and high computational overhead in continual learning for large language models (LLMs), this paper proposes a subspace-aware lightweight prompt tuning framework. While keeping LLM parameters frozen, it introduces PCA-guided low-dimensional subspace constraints into prompt optimization—the first such integration—enabling full retention of pretraining knowledge with only 0.04% trainable parameters. Further integrating LoRA, the method supports adaptive training with tunable accuracy–cost trade-offs. Evaluated on SuperGLUE, it achieves 100% pretraining knowledge retention, surpasses baseline accuracy, and attains state-of-the-art performance using merely 1% tunable parameters, alongside substantial training efficiency gains. The core innovation lies in unifying subspace geometric constraints with prompt tuning, thereby achieving both efficient adaptation and strong knowledge stability.
📝 Abstract
We propose SPARC, a lightweight continual learning framework for large language models (LLMs) that enables efficient task adaptation through prompt tuning in a lower-dimensional space. By leveraging principal component analysis (PCA), we identify a compact subspace of the training data. Optimizing prompts in this lower-dimensional space enhances training efficiency, as it focuses updates on the most relevant features while reducing computational overhead. Furthermore, since the model's internal structure remains unaltered, the extensive knowledge gained from pretraining is fully preserved, ensuring that previously learned information is not compromised during adaptation. Our method achieves high knowledge retention in both task-incremental and domain-incremental continual learning setups while fine-tuning only 0.04% of the model's parameters. Additionally, by integrating LoRA, we enhance adaptability to computational constraints, allowing for a tradeoff between accuracy and training cost. Experiments on the SuperGLUE benchmark demonstrate that our PCA-based prompt tuning combined with LoRA maintains full knowledge retention while improving accuracy, utilizing only 1% of the model's parameters. These results establish our approach as a scalable and resource-efficient solution for continual learning in LLMs.