MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

📅 2024-06-11
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance trade-offs arising from task objective conflicts in multi-task model merging. We propose MAP, a low-overhead algorithm that efficiently generates Pareto-optimal scaling coefficient sets. To our knowledge, this is the first work to incorporate an amortized Pareto front into model merging. We introduce a quadratic surrogate modeling strategy that obviates fine-tuning and repeated evaluation. Building upon this, we derive two lightweight variants: Bayesian MAP for few-task settings and Nested MAP for many-task scenarios. Evaluated on CV and NLP multi-task benchmarks, MAP accurately constructs the Pareto front while reducing evaluation cost by over an order of magnitude compared to baselines. Our core contributions are: (1) the first fine-tuning-free merging framework enabling interpretable and customizable multi-objective trade-offs; and (2) an efficient, theoretically grounded paradigm for Pareto front estimation with strong practical applicability.

Technology Category

Application Category

📝 Abstract
Model merging has emerged as an effective approach to combine multiple single-task models into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during the merging process. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP efficiently identifies a Pareto set of scaling coefficients for merging multiple models, reflecting the trade-offs involved. It amortizes the substantial computational cost of evaluations needed to estimate the Pareto front by using quadratic approximation surrogate models derived from a pre-selected set of scaling coefficients. Experimental results on vision and natural language processing tasks demonstrate that MAP can accurately identify the Pareto front, providing practitioners with flexible solutions to balance competing task objectives. We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
Problem

Research questions and friction points this paper is trying to address.

Efficiently merge multiple single-task models into multitask model
Address trade-offs between conflicting task objectives during merging
Reduce computational cost of estimating Pareto front in merging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses quadratic approximation for Pareto front estimation
Amortizes computational cost with surrogate models
Introduces Bayesian and Nested MAP variants
🔎 Similar Papers
No similar papers found.
L
Lu Li
University of Pennsylvania
T
Tianyu Zhang
MILA
Zhiqi Bu
Zhiqi Bu
Research Scientist, FAIR
Deep LearningDifferential PrivacyStatisticsOptimization Algorithm
Suyuchen Wang
Suyuchen Wang
Université de Montréal / Mila
NLPLLMVLMDeep Learning
H
Huan He
University of Pennsylvania
J
Jie Fu
HKUST
Y
Yonghui Wu
University of Florida
J
Jiang Bian
University of Florida
Y
Yong Chen
University of Pennsylvania
Y
Y. Bengio
MILA