Calibrated Preference Learning: The Case of Label Ranking

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the pervasive issue in label ranking models wherein predicted probabilities often misalign with empirical ranking frequencies, reflecting a lack of reliable calibration. It establishes the first systematic theoretical framework for calibration in label ranking, formally defining calibration notions for full rankings, partial rankings, and Top-k rankings, and constructing a hierarchical structure that elucidates their entailment and incomparability relationships. Through probabilistic ranking modeling, theoretical analysis, and empirical evaluation, the study reveals widespread miscalibration among mainstream models. While calibration exhibits a strong yet imperfect correlation with standard accuracy metrics, it captures an essential quality dimension beyond Top-1 accuracy. The work further demonstrates the practical relevance of calibration by integrating it into the evaluation of reward models in reinforcement learning from human feedback (RLHF).

📝 Abstract

Calibration, the alignment of predicted probabilities with true outcome frequencies, is essential for reliable decision-making. While extensively studied for classification and regression, calibration has not been formally addressed for probabilistic label ranking, where the goal is to predict a distribution over orderings of a label set. Naively treating rankings as classes ignores their structure and fails to capture important modalities such as pairwise and top-k predictions. We formalize calibration for label ranking and develop a hierarchy of notions covering full rankings, sub-rankings, and top-k rankings. We prove that full-rank calibration implies the others but not conversely, and sub-ranking and top-k calibration are incomparable. Empirically, we find popular label ranking models are often poorly calibrated, with substantial differences between sub-ranking and top-k metrics. Applying our framework to RLHF reward models, we find that calibration correlates strongly but not perfectly with benchmark accuracy, suggesting it captures a meaningful quality dimension beyond top-1 accuracy. These findings motivate future work on understanding the downstream effects of miscalibration and developing methods to correct it.

Problem

Research questions and friction points this paper is trying to address.

calibration

label ranking

probabilistic prediction

top-k ranking

sub-ranking

Innovation

Methods, ideas, or system contributions that make the work stand out.

calibrated preference learning

label ranking

probability calibration