🤖 AI Summary
This work addresses the performance trade-offs arising from task objective conflicts in multi-task model merging. We propose MAP, a low-overhead algorithm that efficiently generates Pareto-optimal scaling coefficient sets. To our knowledge, this is the first work to incorporate an amortized Pareto front into model merging. We introduce a quadratic surrogate modeling strategy that obviates fine-tuning and repeated evaluation. Building upon this, we derive two lightweight variants: Bayesian MAP for few-task settings and Nested MAP for many-task scenarios. Evaluated on CV and NLP multi-task benchmarks, MAP accurately constructs the Pareto front while reducing evaluation cost by over an order of magnitude compared to baselines. Our core contributions are: (1) the first fine-tuning-free merging framework enabling interpretable and customizable multi-objective trade-offs; and (2) an efficient, theoretically grounded paradigm for Pareto front estimation with strong practical applicability.
📝 Abstract
Model merging has emerged as an effective approach to combine multiple single-task models into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during the merging process. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP efficiently identifies a Pareto set of scaling coefficients for merging multiple models, reflecting the trade-offs involved. It amortizes the substantial computational cost of evaluations needed to estimate the Pareto front by using quadratic approximation surrogate models derived from a pre-selected set of scaling coefficients. Experimental results on vision and natural language processing tasks demonstrate that MAP can accurately identify the Pareto front, providing practitioners with flexible solutions to balance competing task objectives. We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.