🤖 AI Summary
In preference learning, inconsistent human annotations hinder accurate modeling of human preferences. To address this, we propose a rule-free self-supervised data consistency calibration framework. Our method leverages surrogate model distillation and consistency confidence modeling to automatically identify high-consistency samples, and integrates annotation quality assessment with model training via dynamic reweighting—forming a closed-loop coupling. We validate the framework across multiple preference optimization algorithms—including DPO, KTO, and RLHF—demonstrating an average 33% performance gain on instruction-following benchmarks. The approach significantly improves both instruction adherence and algorithmic robustness. This work establishes the first scalable and reproducible paradigm for preference data governance, with all code publicly released.
📝 Abstract
Inconsistent annotations in training corpora, particularly within preference learning datasets, pose challenges in developing advanced language models. These inconsistencies often arise from variability among annotators and inherent multi-dimensional nature of the preferences. To address these issues, we introduce a self-curation method that preprocesses annotated datasets by leveraging proxy models trained directly on them. Our method enhances preference learning by automatically detecting and selecting consistent annotations. We validate the proposed approach through extensive instruction-following tasks, demonstrating performance improvements of up to 33% across various learning algorithms and proxy capabilities. This work offers a straightforward and reliable solution to address preference inconsistencies without relying on heuristics, serving as an initial step toward the development of more advanced preference learning methodologies. Code is available at https://github.com/Self-Curation/ .