Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

In LLM alignment, misalignment between preference data difficulty and model capacity severely degrades alignment performance—overly difficult samples exceed the model’s capability, leading to performance degradation. Method: The authors propose the principle that “data difficulty must match model capacity,” quantifying sample difficulty via learning-order consistency, and introduce Selective DPO—a difficulty-aware, dynamic data filtering method built atop the DPO framework without modifying model architecture or loss functions. Contribution/Results: On AlpacaEval 2, Selective DPO improves win rates by 9–16% over the baseline DPO, significantly outperforming multiple DPO variants. This demonstrates both the effectiveness and generalizability of difficulty-adaptive data selection in preference-based alignment.

Technology Category

Application Category

📝 Abstract

The alignment of large language models (LLMs) often assumes that using more clean data yields better outcomes, overlooking the match between model capacity and example difficulty. Challenging this, we propose a new principle: Preference data vary in difficulty, and overly difficult examples hinder alignment, by exceeding the model's capacity. Through systematic experimentation, we validate this principle with three key findings: (1) preference examples vary in difficulty, as evidenced by consistent learning orders across alignment runs; (2) overly difficult examples significantly degrade performance across four LLMs and two datasets; and (3) the capacity of a model dictates its threshold for handling difficult examples, underscoring a critical relationship between data selection and model capacity. Building on this principle, we introduce Selective DPO, which filters out overly difficult examples. This simple adjustment improves alignment performance by 9-16% in win rates on the AlpacaEval 2 benchmark compared to the DPO baseline, suppressing a series of DPO variants with different algorithmic adjustments. Together, these results illuminate the importance of aligning data difficulty with model capacity, offering a transformative perspective for improving alignment strategies in LLMs. Code is available at https://github.com/glorgao/SelectiveDPO.

Problem

Research questions and friction points this paper is trying to address.

Data difficulty affects model alignment

Overly difficult examples degrade performance

Model capacity determines data handling threshold

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective DPO filters difficult examples

Aligns data difficulty with model capacity

Improves alignment performance by 9-16%

🔎 Similar Papers

Challenges and Future Directions of Data-Centric AI Alignment