Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic

📅 2024-08-24
🏛️ arXiv.org
📈 Citations: 7
Influential: 1
📄 PDF
🤖 AI Summary
To address task interference and excessive computational overhead caused by global merging in multi-task fine-tuned models, this paper proposes a two-stage sparse fusion paradigm—“localize-then-stitch”: first, task-aware gradient analysis identifies critical sparse parameter regions for each task; then, these regions are arithmetically stitched back into the pretrained backbone. The method updates only ~1% of parameters while enabling task-level precise merging. The localized regions are both interpretable and modular, facilitating continual, low-overhead skill composition. Evaluated on multimodal vision–language benchmarks, our approach significantly outperforms existing model merging techniques—preserving pretrained knowledge, achieving model compression, and enhancing zero-shot task generalization.

Technology Category

Application Category

📝 Abstract
Model merging offers an effective strategy to combine the strengths of multiple finetuned models into a unified model that preserves the specialized capabilities of each. Existing methods merge models in a global manner, performing arithmetic operations across all model parameters. However, such global merging often leads to task interference, degrading the performance of the merged model. In this work, we introduce Localize-and-Stitch, a novel approach that merges models in a localized way. Our algorithm works in two steps: i) Localization: identify tiny ($1%$ of the total parameters) localized regions in the finetuned models containing essential skills for the downstream tasks, and ii) Stitching: reintegrate only these essential regions back into the pretrained model for task synergy. We demonstrate that our approach effectively locates sparse regions responsible for finetuned performance, and the localized regions could be treated as compact and interpretable representations of the finetuned models (tasks). Empirically, we evaluate our method on various vision and language benchmarks, showing that it outperforms existing model merging methods under different data availability scenarios. Beyond strong empirical performance, our algorithm also facilitates model compression and preserves pretrained knowledge, enabling flexible and continual skill composition from multiple finetuned models with minimal storage and computational overhead. Our code is available at https://github.com/uiuctml/Localize-and-Stitch.
Problem

Research questions and friction points this paper is trying to address.

Model Fusion
Efficiency
Resource Consumption
Innovation

Methods, ideas, or system contributions that make the work stand out.

Localize-and-Stitch
Model Fusion
Task-specific Adaptation
🔎 Similar Papers
No similar papers found.
Y
Yifei He
University of Illinois Urbana-Champaign
Y
Yuzheng Hu
University of Illinois Urbana-Champaign
Yong Lin
Yong Lin
Princeton University
Forma Math ReasoningLLM Post-training
T
Tong Zhang
University of Illinois Urbana-Champaign
H
Han Zhao
University of Illinois Urbana-Champaign