Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the misalignment of large language model (LLM) agents with human rationality and moral preferences in strategic interactions—e.g., GPT-4o’s excessive cooperation and incentive insensitivity in economic games. We propose a lightweight, economics-grounded preference alignment method. Our approach formalizes dual preferences: “homo economicus” (utility maximization) and “homo moralis” (normative constraints), enabling an interpretable, low-cost synthetic data generation framework; alignment is achieved via supervised fine-tuning. Experiments demonstrate that minimal fine-tuning data suffices to substantially improve rational consistency and moral compliance across canonical games, autonomous driving ethical decision-making, and algorithmic pricing. To our knowledge, this is the first work to systematically integrate formal economic modeling into LLM behavioral alignment, establishing a verifiable, reproducible paradigm for AI value alignment.

Technology Category

Application Category

📝 Abstract
Understanding how large language model (LLM) agents behave in strategic interactions is essential as these systems increasingly participate autonomously in economically and morally consequential decisions. We evaluate LLM preferences using canonical economic games, finding substantial deviations from human behavior. Models like GPT-4o show excessive cooperation and limited incentive sensitivity, while reasoning models, such as o3-mini, align more consistently with payoff-maximizing strategies. We propose a supervised fine-tuning pipeline that uses synthetic datasets derived from economic reasoning to align LLM agents with economic preferences, focusing on two stylized preference structures. In the first, utility depends only on individual payoffs (homo economicus), while utility also depends on a notion of Kantian universalizability in the second preference structure (homo moralis). We find that fine-tuning based on small datasets shifts LLM agent behavior toward the corresponding economic agent. We further assess the fine-tuned agents' behavior in two applications: Moral dilemmas involving autonomous vehicles and algorithmic pricing in competitive markets. These examples illustrate how different normative objectives embedded via realizations from structured preference structures can influence market and moral outcomes. This work contributes a replicable, cost-efficient, and economically grounded pipeline to align AI preferences using moral-economic principles.
Problem

Research questions and friction points this paper is trying to address.

Aligning LLM agents with rational and moral preferences
Addressing deviations from human behavior in economic games
Fine-tuning LLMs for moral dilemmas and market applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised fine-tuning for LLM alignment
Synthetic datasets from economic reasoning
Moral-economic principles in AI preferences
🔎 Similar Papers
2024-10-02arXiv.orgCitations: 0
W
Wei Lu
Zicklin School of Business, CUNY Baruch College, New York, NY
Daniel L. Chen
Daniel L. Chen
Radcliffe Institute for Advanced Study at Harvard
Law and EconomicsBehavioral EconomicsPolitical EconomyLabor EconomicsDevelopment Economics
C
Christian B. Hansen
Booth School of Business, University of Chicago, Chicago, IL