A Revealed Preference Framework for AI Alignment

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a central question in AI alignment: whether AI agents genuinely reflect human preferences or are instead driven by their own intrinsic objectives. To tackle this issue, the authors introduce revealed preference theory into AI alignment research for the first time and propose the Luce Alignment Model, which formalizes an AI’s choice behavior as a Luce-rule mixture of human preferences and the AI’s intrinsic biases. They further develop a mixed-preference identification algorithm that quantifies the degree of alignment solely from observed AI behavior. Demonstrated to be effective in both controlled laboratory settings and real-world scenarios, this approach provides the first operational, data-driven analytical framework for evaluating AI alignment.
📝 Abstract
Human decision makers increasingly delegate choices to AI agents, raising a natural question: does the AI implement the human principal's preferences or pursue its own? To study this question using revealed preference techniques, I introduce the Luce Alignment Model, where the AI's choices are a mixture of two Luce rules, one reflecting the human's preferences and the other the AI's. I show that the AI's alignment (similarity of human and AI preferences) can be generically identified in two settings: the laboratory setting, where both human and AI choices are observed, and the field setting, where only AI choices are observed.
Problem

Research questions and friction points this paper is trying to address.

AI alignment
revealed preference
human-AI preference
delegation
preference identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

revealed preference
AI alignment
Luce model
preference identification
human-AI delegation
🔎 Similar Papers
No similar papers found.