From Noise to Signal to Selbstzweck: Reframing Human Label Variation in the Era of Post-training in NLP

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a critical limitation in current preference learning datasets: the forced aggregation of multiple human annotations into a single label obscures inherent human value diversity (HLV), thereby undermining the fundamental objective of AI alignment. To address this, we propose—novelty—the treatment of HLV as an end in itself (“Selbstzweck”), advocating its active preservation rather than suppression during large language model post-training. Methodologically, within the reinforcement learning from human feedback (RLHF) framework, we systematically analyze diverse strategies for integrating multi-annotator data and evaluate their impacts on model robustness, fairness, and value alignment. Our contributions are threefold: (1) establishing HLV as a foundational theoretical objective for alignment; (2) introducing concrete, implementable techniques to preserve HLV in training; and (3) challenging the dominant paradigm that seeks a single “correct” preference, thus charting a new direction toward AI systems that authentically respect pluralistic human values.

Technology Category

Application Category

📝 Abstract
Human Label Variation (HLV) refers to legitimate disagreement in annotation that reflects the genuine diversity of human perspectives rather than mere error. For decades, HLV in NLP was dismissed as noise to be discarded, and only slowly over the last decade has it been reframed as a signal for improving model robustness. With the rise of large language models (LLMs), where post-training on human feedback has become central to model alignment, the role of HLV has become increasingly consequential. Yet current preference-learning datasets routinely aggregate multiple annotations into a single label, thereby flattening diverse perspectives into a false universal agreement and erasing precisely the pluralism of human values that alignment aims to preserve. In this position paper, we argue that preserving HLV as an embodiment of human pluralism must be treated as a Selbstzweck - a goal it self when designing AI systems. We call for proactively incorporating HLV into preference datasets and outline actionable steps towards it.
Problem

Research questions and friction points this paper is trying to address.

Human label variation reflects legitimate diversity in annotation
Current datasets flatten diverse perspectives into false universal agreement
Preserving human pluralism must be treated as a primary goal
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preserving human label variation as pluralism goal
Incorporating diverse annotations into preference datasets
Reframing label disagreement as signal for alignment
🔎 Similar Papers
No similar papers found.