GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing benchmarks, which predominantly focus on abstract scenarios and fail to adequately assess large language models’ ability to balance regulatory compliance with business objectives in real-world commercial settings. To bridge this gap, the authors introduce the GAIN benchmark, which explicitly models five contextual pressure factors—such as goal alignment, risk aversion, and emotional or ethical considerations—and comprises 1,200 realistic scenarios spanning four major business domains. Through a scenario simulation framework, multidimensional pressure variables, and comparative analysis against human decision-making, the study reveals that state-of-the-art large language models generally align with human judgments across most pressure conditions, yet demonstrate stronger adherence to norms when faced with personal incentives, indicating heightened compliance tendencies.

Technology Category

Application Category

📝 Abstract
We introduce GAIN (Goal-Aligned Decision-Making under Imperfect Norms), a benchmark designed to evaluate how large language models (LLMs) balance adherence to norms against business goals. Existing benchmarks typically focus on abstract scenarios rather than real-world business applications. Furthermore, they provide limited insights into the factors influencing LLM decision-making. This restricts their ability to measure models' adaptability to complex, real-world norm-goal conflicts. In GAIN, models receive a goal, a specific situation, a norm, and additional contextual pressures. These pressures, explicitly designed to encourage potential norm deviations, are a unique feature that differentiates GAIN from other benchmarks, enabling a systematic evaluation of the factors influencing decision-making. We define five types of pressures: Goal Alignment, Risk Aversion, Emotional/Ethical Appeal, Social/Authoritative Influence, and Personal Incentive. The benchmark comprises 1,200 scenarios across four domains: hiring, customer support, advertising and finance. Our experiments show that advanced LLMs frequently mirror human decision-making patterns. However, when Personal Incentive pressure is present, they diverge significantly, showing a strong tendency to adhere to norms rather than deviate from them.
Problem

Research questions and friction points this paper is trying to address.

goal-aligned decision-making
imperfect norms
large language models
norm-goal conflict
benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

goal-aligned decision-making
imperfect norms
contextual pressures
large language models
benchmark
🔎 Similar Papers
No similar papers found.