Preference Consistency Matters: Enhancing Preference Learning in Language Models with Automated Self-Curation of Training Corpora

📅 2024-08-23

📈 Citations: 0

✨ Influential: 0

career value

134K/year

🤖 AI Summary

In preference learning, inconsistent human annotations hinder accurate modeling of human preferences. To address this, we propose a rule-free self-supervised data consistency calibration framework. Our method leverages surrogate model distillation and consistency confidence modeling to automatically identify high-consistency samples, and integrates annotation quality assessment with model training via dynamic reweighting—forming a closed-loop coupling. We validate the framework across multiple preference optimization algorithms—including DPO, KTO, and RLHF—demonstrating an average 33% performance gain on instruction-following benchmarks. The approach significantly improves both instruction adherence and algorithmic robustness. This work establishes the first scalable and reproducible paradigm for preference data governance, with all code publicly released.

Technology Category

Application Category

📝 Abstract

Inconsistent annotations in training corpora, particularly within preference learning datasets, pose challenges in developing advanced language models. These inconsistencies often arise from variability among annotators and inherent multi-dimensional nature of the preferences. To address these issues, we introduce a self-curation method that preprocesses annotated datasets by leveraging proxy models trained directly on them. Our method enhances preference learning by automatically detecting and selecting consistent annotations. We validate the proposed approach through extensive instruction-following tasks, demonstrating performance improvements of up to 33% across various learning algorithms and proxy capabilities. This work offers a straightforward and reliable solution to address preference inconsistencies without relying on heuristics, serving as an initial step toward the development of more advanced preference learning methodologies. Code is available at https://github.com/Self-Curation/ .

Problem

Research questions and friction points this paper is trying to address.

Inconsistent Labels

Language Model Training

Human Preference Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic Cleanup

Human Preference Learning

Efficiency Improvement

🔎 Similar Papers

Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment