Tensorized Clustered LoRA Merging for Multi-Task Interference

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

To address task interference in multi-task LoRA adapter merging, this paper proposes a text- and parameter-level co-optimized tensorized clustering method. First, input samples are clustered at the text level within the embedding space, and dedicated LoRA adapters are trained for each task cluster. Second, a tensorized LoRA architecture is designed and jointly decomposed via Canonical Polyadic (CP) decomposition to explicitly disentangle task-specific and shared factors at the knowledge level. This approach effectively mitigates cross-task interference, improving downstream task accuracy by 1.4% on Phi-3 and 2.3% on Mistral-7B—outperforming SVD-based baselines significantly. The core innovation lies in integrating input representation clustering with tensor decomposition, enabling the first dual-granularity (text- and parameter-level) knowledge disentanglement in multi-task LoRA merging.

Technology Category

Application Category

📝 Abstract

Despite the success of the monolithic dense paradigm of large language models (LLMs), the LoRA adapters offer an efficient solution by fine-tuning small task-specific modules and merging them with the base model. However, in multi-task settings, merging LoRA adapters trained on heterogeneous sources frequently causes extit{task interference}, degrading downstream performance. To address this, we propose a tensorized clustered LoRA (TC-LoRA) library targeting to address the task interference at the extit{text-level} and extit{parameter-level}. At the extit{text-level}, we cluster the training samples in the embedding space to capture input-format similarities, then train a specialized LoRA adapter for each cluster. At the extit{parameter-level}, we introduce a joint Canonical Polyadic (CP) decomposition that disentangles task-specific and shared factors across LoRA adapters. This joint factorization preserves essential knowledge while reducing cross-task interference. Extensive experiments on out-of-domain zero-shot and skill-composition tasks-including reasoning, question answering, and coding. Compared to strong SVD-based baselines, TC-LoRA achieves +1.4% accuracy on Phi-3 and +2.3% on Mistral-7B (+2.3%), demonstrating the effectiveness of TC-LoRA in LLM adaptation.

Problem

Research questions and friction points this paper is trying to address.

Addresses task interference in multi-task LoRA adapter merging

Clusters training samples to reduce text-level interference

Decomposes LoRA adapters to minimize parameter-level interference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clusters training samples in embedding space

Uses Canonical Polyadic decomposition for adapters

Trains specialized LoRA adapter per cluster

🔎 Similar Papers

No similar papers found.