Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data

πŸ“… 2025-06-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing multi-task model merging methods achieve strong in-distribution (ID) performance but suffer from severe out-of-distribution (OOD) generalization deficits. To address this, we propose Layer-wise Pruning of Task Vectors (LwPTV), the first model merging framework to introduce layer-granularity interpretable pruning. LwPTV models task-specific parameter deviations as task vectors, computes layer-wise significance scores to identify redundant parameters, and dynamically applies sparse masks while preserving the pre-trained backbone weights. Crucially, it maintains ID accuracy with zero degradation while substantially enhancing OOD robustness. Moreover, LwPTV is modular and compatible with 12 state-of-the-art merging methods in a plug-and-play manner. Evaluated across multiple OOD benchmarks, LwPTV achieves an average accuracy improvement of 8.2% over baseline merging approaches.

Technology Category

Application Category

πŸ“ Abstract
Multi-task learning (MTL) concurrently trains a model on diverse task datasets to exploit common features, thereby improving overall performance across the tasks. Recent studies have dedicated efforts to merging multiple independent model parameters into a unified model for MTL, thus circumventing the need for training data and expanding the scope of applicable scenarios of MTL. However, current approaches to model merging predominantly concentrate on enhancing performance within in-domain (ID) datasets, often overlooking their efficacy on out-of-domain (OOD) datasets. In this work, we proposed LwPTV (Layer-wise Pruning Task Vector) by building a saliency score, measuring the redundancy of parameters in task vectors. Designed in this way ours can achieve mask vector for each task and thus perform layer-wise pruning on the task vectors, only keeping the pre-trained model parameters at the corresponding layer in merged model. Owing to its flexibility, our method can be seamlessly integrated with most of existing model merging methods to improve their performance on OOD tasks. Extensive experiments demonstrate that the application of our method results in substantial enhancements in OOD performance while preserving the ability on ID tasks.
Problem

Research questions and friction points this paper is trying to address.

Improving model merging for out-of-domain data generalization
Reducing parameter redundancy in task vectors via pruning
Enhancing OOD performance without compromising in-domain tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-wise pruning for task vectors
Saliency score measures parameter redundancy
Seamless integration with existing merging methods
πŸ”Ž Similar Papers
No similar papers found.
B
Bingjie Zhang
College of Artificial Intelligence, Jilin University
Hongkang Li
Hongkang Li
University of Pennsylvania
Machine Learningdeep learning theoryGraph neural network
C
Changlong Shi
College of Artificial Intelligence, Jilin University
G
Guowei Rong
College of applied Science, Taiyuan University of Science and Technology
H
He Zhao
CSIRO’s Data61
D
Dongsheng Wang
College of Computer Science and Software Engineering, Shenzhen University
D
Dandan Guo
College of Artificial Intelligence, Jilin University
M
Meng Wang
Rensselaer Polytechnic Institute