๐ค AI Summary
Existing methods struggle to effectively merge multiple task-specific LoRA adapters into a single low-rank adapter without causing capability fragmentation or violating the low-rank structure. This work proposes a novel โCompress-then-Mergeโ (CtM) paradigm: it first constructs a shared r-dimensional subspace from the LoRA weights and orthogonally projects each adapter onto this subspace, then performs standard merging within the resulting rรr core coordinate space, followed by truncated SVD to strictly enforce the target rank constraint. Evaluated across multiple models and tasks, CtM significantly outperforms existing single-LoRA baselines and substantially narrows the performance gap with full-parameter merging, achieving for the first time an efficient and rank-preserving LoRA fusion.
๐ Abstract
Low-rank adaptation (LoRA) enables parameter-efficient specialization of foundation models, but the proliferation of task-specific adapters fragments capabilities across many adapters, complicating reuse and deployment. We study the problem of merging $T$ LoRAs into a single rank-$r$ LoRA, thereby preserving the benefits of low-rank structure. Existing Merge-then-Compress pipelines treat the rank constraint as an afterthought: they merge adapters in the full parameter space, then compress the merged result to rank $r$ via truncated SVD. However, full-parameter merging may destroy the low-rank structure, making it difficult for subsequent compression to recover an effective rank-$r$ LoRA. We propose Compress-then-Merge (CtM), a reversed pipeline that enforces the rank-$r$ bottleneck before merging: CtM computes shared $r$-dimensional subspaces using only the LoRA weights to capture cross-adapter common structure, projects each adapter into the shared subspaces to obtain $r\times r$ coordinates, and then applies standard merging rules in this reduced space. CtM guarantees a rank-$r$ LoRA by construction, avoiding post-hoc truncation, and enables efficient computation in the core space spanned by concatenated LoRA factors. Experiments across multiple models and tasks show that CtM consistently outperforms existing single-LoRA-output baselines while narrowing the performance gap to full-parameter merging methods.