Hidden Consensus:Preference-Validity Compression in Human Feedback

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard RLHF compresses diverse human preferences into a single reward signal, overlooking the coexistence of multiple valid responses in structurally pluralistic societies and thereby inducing alignment distortion. This work introduces, for the first time, the concept of “preference validity compression,” using Malaysia’s multicultural context as a case study, and advocates that alignment methods should satisfy “validity-preserving consistency.” By modeling preference events through trio-annotation prompts and multi-participant acceptability judgments across 321 scenarios, the study finds that 79% of prompts admit multiple majority-supported valid responses. Incorporating all such options substantially narrows the apparent performance gap among top responses, revealing a significant measurement bias inherent in conventional aggregation mechanisms when applied to pluralistic settings.
📝 Abstract
Standard RLHF pipelines often reduce heterogeneous human judgments into a single scalar reward target. We argue that this reduction can mis-measure alignment in structurally plural societies, where disagreement may reflect culturally, historically, linguistically, regionally, or normatively grounded interpretations rather than annotation noise. We call this failure Preference-Validity Compression, the collapse of multiple plural-valid response options into a single optimization target. Using Malaysia as a diagnostic setting, we analyze RLHF-style feedback aggregation through preference events linking prompts, responses, and acceptability judgments across interpretive frames. Across 321 preference events from 20 participants and 107 trio-annotated prompts, 79% of prompts contain more than one majority-supported response that single-winner aggregation would discard, and apparent dominance gaps between top responses diminish when all majority-supported options are considered. Participants frequently select multiple acceptable responses, and discarded responses demonstrably reflect coherent local, practical, or cultural frames. These findings show that majority aggregation in this corpus measures argmax acceptability rather than plural alignment. We treat this as a measurement-validity issue and argue that future alignment methods should satisfy Validity-Preserving Consistency, remaining stable across plural-valid interpretive frames rather than collapsing them into a single reward target.
Problem

Research questions and friction points this paper is trying to address.

Preference-Validity Compression
human feedback
plural alignment
RLHF
measurement validity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference-Validity Compression
Plural Alignment
Validity-Preserving Consistency
Human Feedback Aggregation
Interpretive Frames
D
Dorcas Chia Ern Chua
YTL AI Labs
K
Karen Myn Hui Lee
YTL AI Labs
J
Jia Yue Tan
YTL AI Labs
Z
Zhen Xue Gue
Monash University Malaysia
N
Norzalena Abdul Hamid
YTL AI Labs
A
Azima Binti Azmi
YTL AI Labs
K
Keat Mei Yeong
YTL AI Labs
A
Aizat Izyani binti Mujab
YTL AI Labs
H
Hafsah Noor Azam
YTL AI Labs
C
Chee Guo Khoo
Universiti Malaysia Sarawak
H
Han Ying Lim
YTL AI Labs
Chee Seng Chan
Chee Seng Chan
Universiti Malaya, Malaysia
Computer VisionMachine LearningImage Processing