ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models

๐Ÿ“… 2024-12-17
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of unreliable moral judgment by large language models (LLMs) in scenarios involving conflicting social norms, this paper proposes an interpretable ethical alignment method. Methodologically, it introduces the first contrastive ethical insight framework, explicitly modeling how humans perform moral reasoning via normative relianceโ€”through multi-perspective social norm extraction, dynamic contextual filtering, and consensus-driven consistency evaluation. By integrating LLM-based reasoning with contrastive learning, the method automatically generates normative justifications that are both highly plausible and human-understandable. Empirically, it achieves significant improvements over state-of-the-art methods across multiple moral judgment benchmarks. Human evaluations further confirm substantial gains in judgment transparency and trustworthiness. Overall, this work establishes a new paradigm for LLM ethical alignment that jointly ensures accuracy and interpretability.

Technology Category

Application Category

๐Ÿ“ Abstract
With the rise and widespread use of Large Language Models (LLMs), ensuring their safety is crucial to prevent harm to humans and promote ethical behaviors. However, directly assessing value valence (i.e., support or oppose) by leveraging large-scale data training is untrustworthy and inexplainable. We assume that emulating humans to rely on social norms to make moral decisions can help LLMs understand and predict moral judgment. However, capturing human values remains a challenge, as multiple related norms might conflict in specific contexts. Consider norms that are upheld by the majority and promote the well-being of society are more likely to be accepted and widely adopted (e.g.,"don't cheat,"). Therefore, it is essential for LLM to identify the appropriate norms for a given scenario before making moral decisions. To this end, we introduce a novel moral judgment approach called extit{ClarityEthic} that leverages LLMs' reasoning ability and contrastive learning to uncover relevant social norms for human actions from different perspectives and select the most reliable one to enhance judgment accuracy. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in moral judgment tasks. Moreover, human evaluations confirm that the generated social norms provide plausible explanations that support the judgments. This suggests that modeling human moral judgment with the emulating humans moral strategy is promising for improving the ethical behaviors of LLMs.
Problem

Research questions and friction points this paper is trying to address.

Ensuring LLM safety by promoting ethical behaviors
Assessing value valence untrustworthy without explainable norms
Identifying appropriate social norms for moral decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLMs' reasoning for moral judgment
Uses contrastive learning to identify norms
Selects reliable norms to enhance accuracy
๐Ÿ”Ž Similar Papers
No similar papers found.