Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently rely on superficial patterns in morally sensitive scenarios, lacking human-like contextual balancing, integrated value systems, and ethical theory grounding, leading to biased decisions. Method: We propose the first structured moral reasoning evaluation and distillation framework: (1) a three-dimensional prompt engineering approach grounded in value systems, ethical theories, and cognitive strategies; (2) a value-anchored taxonomy for moral assessment; and (3) identification of optimal alignment via primacy-effect reasoning coupled with Schwartz’s value theory and care ethics. Contribution/Results: Experiments across 12 open-source models and four moral reasoning datasets demonstrate that structured prompting significantly improves decision accuracy and reasoning coherence. We further achieve lossless distillation of moral capabilities from large to small models—distilled models retain 92% of the original model’s moral performance without increased inference overhead.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are increasingly deployed in domains requiring moral understanding, yet their reasoning often remains shallow, and misaligned with human reasoning. Unlike humans, whose moral reasoning integrates contextual trade-offs, value systems, and ethical theories, LLMs often rely on surface patterns, leading to biased decisions in morally and ethically complex scenarios. To address this gap, we present a value-grounded framework for evaluating and distilling structured moral reasoning in LLMs. We benchmark 12 open-source models across four moral datasets using a taxonomy of prompts grounded in value systems, ethical theories, and cognitive reasoning strategies. Our evaluation is guided by four questions: (1) Does reasoning improve LLM decision-making over direct prompting? (2) Which types of value/ethical frameworks most effectively guide LLM reasoning? (3) Which cognitive reasoning strategies lead to better moral performance? (4) Can small-sized LLMs acquire moral competence through distillation? We find that prompting with explicit moral structure consistently improves accuracy and coherence, with first-principles reasoning and Schwartz's + care-ethics scaffolds yielding the strongest gains. Furthermore, our supervised distillation approach transfers moral competence from large to small models without additional inference cost. Together, our results offer a scalable path toward interpretable and value-grounded models.
Problem

Research questions and friction points this paper is trying to address.

Evaluating moral reasoning in large language models
Improving decision-making via structured moral frameworks
Transferring moral competence to smaller models efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Value-grounded framework for moral reasoning evaluation
Prompt taxonomy based on ethical theories
Supervised distillation transfers moral competence
🔎 Similar Papers
No similar papers found.