TANDEM: Bi-Level Data Mixture Optimization with Twin Networks

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the critical challenge of optimizing multi-domain data mixing ratios in large language model training, a problem for which existing methods lack efficiency. The authors formulate this as a bilevel optimization problem and propose TANDEM, a novel framework that transforms it into a single-level optimization with a penalty term by leveraging a twin-network architecture composed of a proxy model and a dynamic reference model. Domain-specific data are dynamically reweighted based on the discrepancy between these two models, prioritizing domains yielding higher performance gains. The approach enjoys theoretical guarantees and is applicable to data-constrained and supervised fine-tuning scenarios. Extensive experiments demonstrate that TANDEM consistently and significantly enhances model performance across diverse settings, confirming its effectiveness and robustness.

📝 Abstract

The capabilities of large language models (LLMs) significantly depend on training data drawn from various domains. Optimizing domain-specific mixture ratios can be modeled as a bi-level optimization problem, which we simplify into a single-level penalized form and solve with twin networks: a proxy model trained on primary data and a dynamically updated reference model trained with additional data. Our proposed method, Twin Networks for bi-level DatA mixturE optiMization (TANDEM), measures the data efficacy through the difference between the twin models and up-weights domains that benefit more from the additional data. TANDEM provides theoretical guarantees and wider applicability, compared to prior approaches. Furthermore, our bi-level perspective suggests new settings to study domain reweighting such as data-restricted scenarios and supervised fine-tuning, where optimized mixture ratios significantly improve the performance. Extensive experiments validate TANDEM's effectiveness in all scenarios.

Problem

Research questions and friction points this paper is trying to address.

data mixture optimization

large language models

domain reweighting

bi-level optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

bi-level optimization

data mixture optimization

twin networks

domain reweighting