The Greatest Good Benchmark: Measuring LLMs’ Alignment with Utilitarian Moral Dilemmas

📅 2025-03-25
🏛️ Conference on Empirical Methods in Natural Language Processing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates large language models’ (LLMs) capacity for utilitarian moral judgment in ethical dilemmas, aiming to establish a quantifiable, reproducible empirical foundation for value alignment. We introduce the first standardized benchmark specifically designed for utilitarian two-alternative moral dilemmas, enabling zero-shot moral judgment analysis across 15 state-of-the-art LLMs. Results reveal a consistent “artificial moral compass” across all models: strong preference for unbiased altruism, systematic rejection of instrumental harm, and systematic divergence from both classical utilitarian theory and population-level moral intuitions. The benchmark thus provides the first empirical characterization of latent, cross-model moral preference structures in LLMs. Moreover, it constitutes the first open-source, extensible evaluation framework dedicated to utilitarian value assessment—offering critical methodological support for alignment research and AI safety governance.

Technology Category

Application Category

📝 Abstract
The question of how to make decisions that maximise the well-being of all persons is very relevant to design language models that are beneficial to humanity and free from harm. We introduce the Greatest Good Benchmark to evaluate the moral judgments of LLMs using utilitarian dilemmas. Our analysis across 15 diverse LLMs reveals consistently encoded moral preferences that diverge from established moral theories and lay population moral standards. Most LLMs have a marked preference for impartial beneficence and rejection of instrumental harm. These findings showcase the ‘artificial moral compass’ of LLMs, offering insights into their moral alignment.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' moral alignment using utilitarian dilemmas
Assessing divergence from human moral standards in LLMs
Identifying LLMs' preference for impartial beneficence and harm rejection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing Greatest Good Benchmark for LLMs
Evaluating moral judgments via utilitarian dilemmas
Analyzing 15 LLMs' artificial moral compass
🔎 Similar Papers
No similar papers found.
G
Giovanni Franco Gabriel Marraffini
Universidad de Buenos Aires, Lumina Labs
Andrés Cotton
Andrés Cotton
Universidad Torcuato Di Tella
N
Noé Fabián Hsueh
Universidad de Buenos Aires
Axel Fridman
Axel Fridman
Universidad de Buenos Aires
Juan Wisznia
Juan Wisznia
Unknown affiliation
L
Luciano del Corro
Universidad de Buenos Aires, Lumina Labs