The Greatest Good Benchmark: Measuring LLMs’ Alignment with Utilitarian Moral Dilemmas

📅 2025-03-25

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates large language models’ (LLMs) capacity for utilitarian moral judgment in ethical dilemmas, aiming to establish a quantifiable, reproducible empirical foundation for value alignment. We introduce the first standardized benchmark specifically designed for utilitarian two-alternative moral dilemmas, enabling zero-shot moral judgment analysis across 15 state-of-the-art LLMs. Results reveal a consistent “artificial moral compass” across all models: strong preference for unbiased altruism, systematic rejection of instrumental harm, and systematic divergence from both classical utilitarian theory and population-level moral intuitions. The benchmark thus provides the first empirical characterization of latent, cross-model moral preference structures in LLMs. Moreover, it constitutes the first open-source, extensible evaluation framework dedicated to utilitarian value assessment—offering critical methodological support for alignment research and AI safety governance.

Technology Category

Application Category

📝 Abstract

The question of how to make decisions that maximise the well-being of all persons is very relevant to design language models that are beneficial to humanity and free from harm. We introduce the Greatest Good Benchmark to evaluate the moral judgments of LLMs using utilitarian dilemmas. Our analysis across 15 diverse LLMs reveals consistently encoded moral preferences that diverge from established moral theories and lay population moral standards. Most LLMs have a marked preference for impartial beneficence and rejection of instrumental harm. These findings showcase the ‘artificial moral compass’ of LLMs, offering insights into their moral alignment.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' moral alignment using utilitarian dilemmas

Assessing divergence from human moral standards in LLMs

Identifying LLMs' preference for impartial beneficence and harm rejection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing Greatest Good Benchmark for LLMs

Evaluating moral judgments via utilitarian dilemmas

Analyzing 15 LLMs' artificial moral compass

🔎 Similar Papers

No similar papers found.

Authors to Follow