🤖 AI Summary
This work addresses the limitation of existing benchmarks, which predominantly focus on abstract scenarios and fail to adequately assess large language models’ ability to balance regulatory compliance with business objectives in real-world commercial settings. To bridge this gap, the authors introduce the GAIN benchmark, which explicitly models five contextual pressure factors—such as goal alignment, risk aversion, and emotional or ethical considerations—and comprises 1,200 realistic scenarios spanning four major business domains. Through a scenario simulation framework, multidimensional pressure variables, and comparative analysis against human decision-making, the study reveals that state-of-the-art large language models generally align with human judgments across most pressure conditions, yet demonstrate stronger adherence to norms when faced with personal incentives, indicating heightened compliance tendencies.
📝 Abstract
We introduce GAIN (Goal-Aligned Decision-Making under Imperfect Norms), a benchmark designed to evaluate how large language models (LLMs) balance adherence to norms against business goals. Existing benchmarks typically focus on abstract scenarios rather than real-world business applications. Furthermore, they provide limited insights into the factors influencing LLM decision-making. This restricts their ability to measure models' adaptability to complex, real-world norm-goal conflicts. In GAIN, models receive a goal, a specific situation, a norm, and additional contextual pressures. These pressures, explicitly designed to encourage potential norm deviations, are a unique feature that differentiates GAIN from other benchmarks, enabling a systematic evaluation of the factors influencing decision-making. We define five types of pressures: Goal Alignment, Risk Aversion, Emotional/Ethical Appeal, Social/Authoritative Influence, and Personal Incentive. The benchmark comprises 1,200 scenarios across four domains: hiring, customer support, advertising and finance. Our experiments show that advanced LLMs frequently mirror human decision-making patterns. However, when Personal Incentive pressure is present, they diverge significantly, showing a strong tendency to adhere to norms rather than deviate from them.