Merge to Mix: Mixing Datasets via Model Merging

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Dataset mixing for large language model fine-tuning typically relies on labor-intensive trial-and-error and incurs substantial computational overhead. Method: This paper proposes a zero-shot dataset composition selection method that leverages model merging as a proxy evaluator—specifically, employing weighted averaging and task vector fusion to predict downstream performance of candidate dataset mixtures, while jointly optimizing mixture weights. Unlike conventional heuristic strategies requiring repeated full fine-tuning, our approach eliminates the need for any fine-tuning during evaluation. Contribution/Results: Experiments across multiple benchmarks demonstrate that our method significantly outperforms existing dataset selection techniques. It achieves comparable or improved final fine-tuned model performance while reducing computational cost by approximately 70% in GPU-hours, enabling efficient, scalable dataset composition design.

Technology Category

Application Category

📝 Abstract

Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring multiple fine-tuning runs to achieve the desired outcome. We propose a novel method, $ extit{Merge to Mix}$, that accelerates composing dataset mixtures through model merging. Model merging is a recent technique that combines the abilities of multiple individually fine-tuned LMs into a single LM by using a few simple arithmetic operations. Our key insight is that merging models individually fine-tuned on each dataset in a mixture can effectively serve as a surrogate for a model fine-tuned on the entire mixture. Merge to Mix leverages this insight to accelerate selecting dataset mixtures without requiring full fine-tuning on each candidate mixture. Our experiments demonstrate that Merge to Mix surpasses state-of-the-art methods in dataset selection for fine-tuning LMs.

Problem

Research questions and friction points this paper is trying to address.

Accelerating dataset mixture selection for fine-tuning large models

Reducing reliance on heuristics and trial-and-error in dataset composition

Enhancing model performance without full fine-tuning on each mixture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses model merging to mix datasets efficiently

Combines individually fine-tuned models arithmetically

Accelerates dataset selection without full fine-tuning

🔎 Similar Papers

No similar papers found.

Authors to Follow