🤖 AI Summary
To address task interference in multi-task LoRA adapter merging, this paper proposes a text- and parameter-level co-optimized tensorized clustering method. First, input samples are clustered at the text level within the embedding space, and dedicated LoRA adapters are trained for each task cluster. Second, a tensorized LoRA architecture is designed and jointly decomposed via Canonical Polyadic (CP) decomposition to explicitly disentangle task-specific and shared factors at the knowledge level. This approach effectively mitigates cross-task interference, improving downstream task accuracy by 1.4% on Phi-3 and 2.3% on Mistral-7B—outperforming SVD-based baselines significantly. The core innovation lies in integrating input representation clustering with tensor decomposition, enabling the first dual-granularity (text- and parameter-level) knowledge disentanglement in multi-task LoRA merging.
📝 Abstract
Despite the success of the monolithic dense paradigm of large language models (LLMs), the LoRA adapters offer an efficient solution by fine-tuning small task-specific modules and merging them with the base model. However, in multi-task settings, merging LoRA adapters trained on heterogeneous sources frequently causes extit{task interference}, degrading downstream performance. To address this, we propose a tensorized clustered LoRA (TC-LoRA) library targeting to address the task interference at the extit{text-level} and extit{parameter-level}. At the extit{text-level}, we cluster the training samples in the embedding space to capture input-format similarities, then train a specialized LoRA adapter for each cluster. At the extit{parameter-level}, we introduce a joint Canonical Polyadic (CP) decomposition that disentangles task-specific and shared factors across LoRA adapters. This joint factorization preserves essential knowledge while reducing cross-task interference. Extensive experiments on out-of-domain zero-shot and skill-composition tasks-including reasoning, question answering, and coding. Compared to strong SVD-based baselines, TC-LoRA achieves +1.4% accuracy on Phi-3 and +2.3% on Mistral-7B (+2.3%), demonstrating the effectiveness of TC-LoRA in LLM adaptation.