Compress then Merge: From Multiple LoRAs into One Low-Rank Adapter

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing methods struggle to effectively merge multiple task-specific LoRA adapters into a single low-rank adapter without causing capability fragmentation or violating the low-rank structure. This work proposes a novel “Compress-then-Merge” (CtM) paradigm: it first constructs a shared r-dimensional subspace from the LoRA weights and orthogonally projects each adapter onto this subspace, then performs standard merging within the resulting r×r core coordinate space, followed by truncated SVD to strictly enforce the target rank constraint. Evaluated across multiple models and tasks, CtM significantly outperforms existing single-LoRA baselines and substantially narrows the performance gap with full-parameter merging, achieving for the first time an efficient and rank-preserving LoRA fusion.

📝 Abstract

Low-rank adaptation (LoRA) enables parameter-efficient specialization of foundation models, but the proliferation of task-specific adapters fragments capabilities across many adapters, complicating reuse and deployment. We study the problem of merging $T$ LoRAs into a single rank-$r$ LoRA, thereby preserving the benefits of low-rank structure. Existing Merge-then-Compress pipelines treat the rank constraint as an afterthought: they merge adapters in the full parameter space, then compress the merged result to rank $r$ via truncated SVD. However, full-parameter merging may destroy the low-rank structure, making it difficult for subsequent compression to recover an effective rank-$r$ LoRA. We propose Compress-then-Merge (CtM), a reversed pipeline that enforces the rank-$r$ bottleneck before merging: CtM computes shared $r$-dimensional subspaces using only the LoRA weights to capture cross-adapter common structure, projects each adapter into the shared subspaces to obtain $r\times r$ coordinates, and then applies standard merging rules in this reduced space. CtM guarantees a rank-$r$ LoRA by construction, avoiding post-hoc truncation, and enables efficient computation in the core space spanned by concatenated LoRA factors. Experiments across multiple models and tasks show that CtM consistently outperforms existing single-LoRA-output baselines while narrowing the performance gap to full-parameter merging methods.

Problem

Research questions and friction points this paper is trying to address.

LoRA

model merging

low-rank adaptation

parameter-efficient fine-tuning

adapter compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA merging

low-rank adaptation

parameter-efficient fine-tuning