TANDEM: Bi-Level Data Mixture Optimization with Twin Networks

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

219K/year
🤖 AI Summary
This work addresses the critical challenge of optimizing multi-domain data mixing ratios in large language model training, a problem for which existing methods lack efficiency. The authors formulate this as a bilevel optimization problem and propose TANDEM, a novel framework that transforms it into a single-level optimization with a penalty term by leveraging a twin-network architecture composed of a proxy model and a dynamic reference model. Domain-specific data are dynamically reweighted based on the discrepancy between these two models, prioritizing domains yielding higher performance gains. The approach enjoys theoretical guarantees and is applicable to data-constrained and supervised fine-tuning scenarios. Extensive experiments demonstrate that TANDEM consistently and significantly enhances model performance across diverse settings, confirming its effectiveness and robustness.
📝 Abstract
The capabilities of large language models (LLMs) significantly depend on training data drawn from various domains. Optimizing domain-specific mixture ratios can be modeled as a bi-level optimization problem, which we simplify into a single-level penalized form and solve with twin networks: a proxy model trained on primary data and a dynamically updated reference model trained with additional data. Our proposed method, Twin Networks for bi-level DatA mixturE optiMization (TANDEM), measures the data efficacy through the difference between the twin models and up-weights domains that benefit more from the additional data. TANDEM provides theoretical guarantees and wider applicability, compared to prior approaches. Furthermore, our bi-level perspective suggests new settings to study domain reweighting such as data-restricted scenarios and supervised fine-tuning, where optimized mixture ratios significantly improve the performance. Extensive experiments validate TANDEM's effectiveness in all scenarios.
Problem

Research questions and friction points this paper is trying to address.

data mixture optimization
large language models
domain reweighting
bi-level optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

bi-level optimization
data mixture optimization
twin networks
domain reweighting
large language models
🔎 Similar Papers
No similar papers found.