Merge to Mix: Mixing Datasets via Model Merging

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dataset mixing for large language model fine-tuning typically relies on labor-intensive trial-and-error and incurs substantial computational overhead. Method: This paper proposes a zero-shot dataset composition selection method that leverages model merging as a proxy evaluator—specifically, employing weighted averaging and task vector fusion to predict downstream performance of candidate dataset mixtures, while jointly optimizing mixture weights. Unlike conventional heuristic strategies requiring repeated full fine-tuning, our approach eliminates the need for any fine-tuning during evaluation. Contribution/Results: Experiments across multiple benchmarks demonstrate that our method significantly outperforms existing dataset selection techniques. It achieves comparable or improved final fine-tuned model performance while reducing computational cost by approximately 70% in GPU-hours, enabling efficient, scalable dataset composition design.

Technology Category

Application Category

📝 Abstract
Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring multiple fine-tuning runs to achieve the desired outcome. We propose a novel method, $ extit{Merge to Mix}$, that accelerates composing dataset mixtures through model merging. Model merging is a recent technique that combines the abilities of multiple individually fine-tuned LMs into a single LM by using a few simple arithmetic operations. Our key insight is that merging models individually fine-tuned on each dataset in a mixture can effectively serve as a surrogate for a model fine-tuned on the entire mixture. Merge to Mix leverages this insight to accelerate selecting dataset mixtures without requiring full fine-tuning on each candidate mixture. Our experiments demonstrate that Merge to Mix surpasses state-of-the-art methods in dataset selection for fine-tuning LMs.
Problem

Research questions and friction points this paper is trying to address.

Accelerating dataset mixture selection for fine-tuning large models
Reducing reliance on heuristics and trial-and-error in dataset composition
Enhancing model performance without full fine-tuning on each mixture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses model merging to mix datasets efficiently
Combines individually fine-tuned models arithmetically
Accelerates dataset selection without full fine-tuning
🔎 Similar Papers
No similar papers found.
Zhixu Silvia Tao
Zhixu Silvia Tao
Princeton University
Federated Learning
K
Kasper Vinken
Fujitsu Research of America
H
Hao-Wei Yeh
Fujitsu Limited
A
Avi Cooper
Fujitsu Research of America
Xavier Boix
Xavier Boix
MIT