🤖 AI Summary
Dataset mixing for large language model fine-tuning typically relies on labor-intensive trial-and-error and incurs substantial computational overhead.
Method: This paper proposes a zero-shot dataset composition selection method that leverages model merging as a proxy evaluator—specifically, employing weighted averaging and task vector fusion to predict downstream performance of candidate dataset mixtures, while jointly optimizing mixture weights. Unlike conventional heuristic strategies requiring repeated full fine-tuning, our approach eliminates the need for any fine-tuning during evaluation.
Contribution/Results: Experiments across multiple benchmarks demonstrate that our method significantly outperforms existing dataset selection techniques. It achieves comparable or improved final fine-tuned model performance while reducing computational cost by approximately 70% in GPU-hours, enabling efficient, scalable dataset composition design.
📝 Abstract
Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring multiple fine-tuning runs to achieve the desired outcome. We propose a novel method, $ extit{Merge to Mix}$, that accelerates composing dataset mixtures through model merging. Model merging is a recent technique that combines the abilities of multiple individually fine-tuned LMs into a single LM by using a few simple arithmetic operations. Our key insight is that merging models individually fine-tuned on each dataset in a mixture can effectively serve as a surrogate for a model fine-tuned on the entire mixture. Merge to Mix leverages this insight to accelerate selecting dataset mixtures without requiring full fine-tuning on each candidate mixture. Our experiments demonstrate that Merge to Mix surpasses state-of-the-art methods in dataset selection for fine-tuning LMs.