STAR: Spectral Truncation and Rescale for Model Merging

📅 2025-02-14

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Model merging enables efficient multi-task model construction, yet task performance often degrades significantly as the number of merged models increases. To address this, we propose STAR—a novel method that, for the first time, jointly employs spectral-space truncation and nuclear-norm-preserving parameter rescaling. Specifically, STAR applies singular value decomposition (SVD) to each model’s weight matrices, truncates small singular-value components to suppress inter-model conflicts, and automatically rescales the remaining components to preserve the nuclear norm—achieving robust, data-free, fine-tuning-free, and hyperparameter-free fusion. STAR generalizes across model scales: on Flan-T5, merging 12 models yields an average +4.2% improvement; it delivers consistent gains across diverse NLP tasks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Model merging is an efficient way of obtaining a multi-task model from several pretrained models without further fine-tuning, and it has gained attention in various domains, including natural language processing (NLP). Despite the efficiency, a key challenge in model merging is the seemingly inevitable decrease in task performance as the number of models increases. In this paper, we propose $mathbf{S}$pectral $mathbf{T}$runcation $mathbf{A}$nd $mathbf{R}$escale (STAR) that aims at mitigating ``merging conflicts'' by truncating small components in the respective spectral spaces, which is followed by an automatic parameter rescaling scheme to retain the nuclear norm of the original matrix. STAR requires no additional inference on original training data and is robust to hyperparamater choice. We demonstrate the effectiveness of STAR through extensive model merging cases on diverse NLP tasks. Specifically, STAR works robustly across varying model sizes, and can outperform baselines by 4.2$%$ when merging 12 models on Flan-T5. Our code is publicly available at https://github.com/IBM/STAR.

Problem

Research questions and friction points this paper is trying to address.

Mitigates merging conflicts in model merging

Improves task performance with multiple models

Applies spectral truncation and rescaling techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral Truncation technique

Automatic parameter rescaling

Robust multi-model merging

🔎 Similar Papers

No similar papers found.