Rho-Perfect: Correlation Ceiling For Subjective Evaluation Datasets

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the fundamental limitation imposed by inherent noise in subjective rating datasets, which constrains the achievable correlation between computational models and human judgments. To date, there has been a lack of practical methods to quantify the upper bound of such correlation. The paper introduces ρ-Perfect, the first approach capable of estimating this theoretical ceiling in real-world settings. By modeling heteroscedastic noise in subjective ratings, the method derives an analytical upper bound on model-human correlation and validates it using a squared approximation of test-retest reliability. Demonstrated on speech quality assessment benchmarks, ρ-Perfect effectively disentangles whether performance plateaus stem from model deficiencies or intrinsic data quality limits, thereby offering a novel tool for evaluating and advancing subjective perception models.

Technology Category

Application Category

📝 Abstract

Subjective ratings contain inherent noise that limits the model-human correlation, but this reliability issue is rarely quantified. In this paper, we present $\rho$-Perfect, a practical estimation of the highest achievable correlation of a model on subjectively rated datasets. We define $\rho$-Perfect to be the correlation between a perfect predictor and human ratings, and derive an estimate of the value based on heteroscedastic noise scenarios, a common occurrence in subjectively rated datasets. We show that $\rho$-Perfect squared estimates test-retest correlation and use this to validate the estimate. We demonstrate the use of $\rho$-Perfect on a speech quality dataset and show how the measure can distinguish between model limitations and data quality issues.

Problem

Research questions and friction points this paper is trying to address.

subjective evaluation

noise

model-human correlation

reliability

correlation ceiling

Innovation

Methods, ideas, or system contributions that make the work stand out.

rho-Perfect

subjective evaluation

correlation ceiling

heteroscedastic noise

test-retest reliability

🔎 Similar Papers

Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks

2024-06-12Citations: 0

Authors to Follow