A Statistical Case Against Empirical Human-AI Alignment

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies an inherent statistical bias in empirical human-AI alignment—i.e., inferring AI objectives solely from observed human behavior—which systematically distorts alignment outcomes. Grounded in statistical principles, the authors formally characterize the root cause: non-random sampling of behavioral observations and absence of counterfactuals, leading to invalid causal inference. Through integrated statistical modeling and empirical case studies using human-centered decoding in language models, they demonstrate the unreliability of purely empirical alignment. In response, they propose a dual-track paradigm: (1) *normative alignment*, which encodes value-based priors as hard constraints on AI objective specification, and (2) *posterior empirical calibration*, which selectively incorporates behavioral data only within this normative framework. This approach reconciles theoretical rigor with practical deployability, offering a more robust methodological foundation and actionable pathway for reliable AI alignment.

Technology Category

Application Category

📝 Abstract
Empirical human-AI alignment aims to make AI systems act in line with observed human behavior. While noble in its goals, we argue that empirical alignment can inadvertently introduce statistical biases that warrant caution. This position paper thus advocates against naive empirical alignment, offering prescriptive alignment and a posteriori empirical alignment as alternatives. We substantiate our principled argument by tangible examples like human-centric decoding of language models.
Problem

Research questions and friction points this paper is trying to address.

Empirical human-AI alignment introduces biases
Advocates against naive empirical alignment
Proposes prescriptive and a posteriori alternatives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Advocates prescriptive AI alignment
Proposes posteriori empirical alignment
Critiques naive empirical alignment
🔎 Similar Papers
No similar papers found.