Compress then Merge: From Multiple LoRAs into One Low-Rank Adapter

๐Ÿ“… 2026-06-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

171K/year
๐Ÿค– AI Summary
Existing methods struggle to effectively merge multiple task-specific LoRA adapters into a single low-rank adapter without causing capability fragmentation or violating the low-rank structure. This work proposes a novel โ€œCompress-then-Mergeโ€ (CtM) paradigm: it first constructs a shared r-dimensional subspace from the LoRA weights and orthogonally projects each adapter onto this subspace, then performs standard merging within the resulting rร—r core coordinate space, followed by truncated SVD to strictly enforce the target rank constraint. Evaluated across multiple models and tasks, CtM significantly outperforms existing single-LoRA baselines and substantially narrows the performance gap with full-parameter merging, achieving for the first time an efficient and rank-preserving LoRA fusion.
๐Ÿ“ Abstract
Low-rank adaptation (LoRA) enables parameter-efficient specialization of foundation models, but the proliferation of task-specific adapters fragments capabilities across many adapters, complicating reuse and deployment. We study the problem of merging $T$ LoRAs into a single rank-$r$ LoRA, thereby preserving the benefits of low-rank structure. Existing Merge-then-Compress pipelines treat the rank constraint as an afterthought: they merge adapters in the full parameter space, then compress the merged result to rank $r$ via truncated SVD. However, full-parameter merging may destroy the low-rank structure, making it difficult for subsequent compression to recover an effective rank-$r$ LoRA. We propose Compress-then-Merge (CtM), a reversed pipeline that enforces the rank-$r$ bottleneck before merging: CtM computes shared $r$-dimensional subspaces using only the LoRA weights to capture cross-adapter common structure, projects each adapter into the shared subspaces to obtain $r\times r$ coordinates, and then applies standard merging rules in this reduced space. CtM guarantees a rank-$r$ LoRA by construction, avoiding post-hoc truncation, and enables efficient computation in the core space spanned by concatenated LoRA factors. Experiments across multiple models and tasks show that CtM consistently outperforms existing single-LoRA-output baselines while narrowing the performance gap to full-parameter merging methods.
Problem

Research questions and friction points this paper is trying to address.

LoRA
model merging
low-rank adaptation
parameter-efficient fine-tuning
adapter compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA merging
low-rank adaptation
parameter-efficient fine-tuning
subspace projection
model compression
๐Ÿ”Ž Similar Papers
Z
Zhengbao He
Institute of Image Processing and Pattern Recognition, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China
R
Ruiqi Ding
Institute of Image Processing and Pattern Recognition, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China
Z
Zhehao Huang
Institute of Image Processing and Pattern Recognition, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China
R
Ruikai Yang
Institute of Image Processing and Pattern Recognition, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China
Tao Li
Tao Li
Shanghai Jiao Tong University
machine learningoptimization
Xiaolin Huang
Xiaolin Huang
Professor, Shanghai Jiao Tong University
machine learningkernel methoddeep neural network trainingpiecewise linear model