Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Traditional model merging assumes isomorphic weight matrices across fine-tuned models, overlooking neuronal functional heterogeneity and thereby impairing effective knowledge integration. To address this, we propose a three-stage framework: decomposition, weight-space renormalization, and merging. We first establish that weight-space renormalization is essential for constructing a mergeable joint representation space. Then, we design an SVD-based weight alignment mechanism that enables robust fusion of heterogeneous fine-tuned models within a unified latent space. Our method is compatible with both full-parameter fine-tuning and adapter-based approaches (e.g., LoRA). Extensive experiments on ViT, DeBERTa, T5, and Llama3.1-8B demonstrate consistent superiority over state-of-the-art merging methods, yielding average accuracy gains of 1.8–3.2 percentage points. Moreover, it significantly improves cross-task generalization and inference efficiency of merged models.

Technology Category

Application Category

📝 Abstract

In the era of large-scale training, model merging has evolved into a tool for creating multitasking models efficiently. It enables the knowledge of models to be fused, without the need for heavy computation as required in traditional multitask learning. Existing merging methods often assume that entries at identical positions in weight matrices serve the same function, enabling straightforward entry-wise comparison and merging. However, this assumption overlooks the complexity of finetuned neural networks, where neurons may develop distinct feature compositions, making direct entry-wise merging problematic. We present Decom-Renorm-Merge (DRM), a simple yet effective approach that leverages Singular Value Decomposition to decompose and coordinate weight matrices into an aligned joint space, where entry-wise merging becomes possible. We showcase the effectiveness of DRM across various settings ranging from smaller encoder-based such as ViT and DeBERTa, encoder-decoder-based such as T5, and larger decoder-based such as Llama3.1-8B. Our experimental results show that DRM outperforms several state-of-the-art merging techniques across full finetuning and low-rank adaptation settings. Moreover, our analysis reveals renormalization as the crucial component for creating a robust and even joint space for merging, significantly contributing to the method's performance.

Problem

Research questions and friction points this paper is trying to address.

Model merging assumes identical weight functions, ignoring neuron complexity

Direct entry-wise merging in finetuned networks causes alignment issues

Need for aligned joint space to enable effective model fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Singular Value Decomposition for alignment

Renormalizes weights for robust joint space

Applies to diverse model architectures effectively

🔎 Similar Papers

One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning