A Revealed Preference Framework for AI Alignment

📅 2026-03-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses a central question in AI alignment: whether AI agents genuinely reflect human preferences or are instead driven by their own intrinsic objectives. To tackle this issue, the authors introduce revealed preference theory into AI alignment research for the first time and propose the Luce Alignment Model, which formalizes an AI’s choice behavior as a Luce-rule mixture of human preferences and the AI’s intrinsic biases. They further develop a mixed-preference identification algorithm that quantifies the degree of alignment solely from observed AI behavior. Demonstrated to be effective in both controlled laboratory settings and real-world scenarios, this approach provides the first operational, data-driven analytical framework for evaluating AI alignment.

📝 Abstract

Human decision makers increasingly delegate choices to AI agents, raising a natural question: does the AI implement the human principal's preferences or pursue its own? To study this question using revealed preference techniques, I introduce the Luce Alignment Model, where the AI's choices are a mixture of two Luce rules, one reflecting the human's preferences and the other the AI's. I show that the AI's alignment (similarity of human and AI preferences) can be generically identified in two settings: the laboratory setting, where both human and AI choices are observed, and the field setting, where only AI choices are observed.

Problem

Research questions and friction points this paper is trying to address.

AI alignment

revealed preference

human-AI preference

delegation

preference identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

revealed preference

AI alignment

Luce model

preference identification

human-AI delegation

🔎 Similar Papers

No similar papers found.

Authors to Follow