🤖 AI Summary
Existing moral reasoning research suffers from fragmented datasets and inconsistent task definitions, hindering systematic, multilingual, and cross-cultural analysis. To address this, we introduce UniMoral—the first unified, cross-lingual, multidimensional moral reasoning dataset—covering Arabic, Chinese, English, Hindi, Russian, and Spanish. UniMoral features fine-grained annotations of action choices, ethical principles, influencing factors, and consequences, augmented with annotator cultural profiles. We propose a four-dimensional unified modeling framework and a multilingual cultural alignment method, establishing a data construction paradigm grounded in psychological foundations and real-world contextual validity. Leveraging LLMs, we conduct cross-lingual benchmarking across four tasks: action prediction, moral classification, attribution analysis, and consequence generation—evaluated with culture-aware metrics. Results show that implicit moral context substantially enhances model reasoning; however, state-of-the-art models exhibit systematic deficiencies in cross-cultural attribution and consequence generation, providing empirical grounding and novel directions for moral alignment.
📝 Abstract
Moral reasoning is a complex cognitive process shaped by individual experiences and cultural contexts and presents unique challenges for computational analysis. While natural language processing (NLP) offers promising tools for studying this phenomenon, current research lacks cohesion, employing discordant datasets and tasks that examine isolated aspects of moral reasoning. We bridge this gap with UniMoral, a unified dataset integrating psychologically grounded and social-media-derived moral dilemmas annotated with labels for action choices, ethical principles, contributing factors, and consequences, alongside annotators' moral and cultural profiles. Recognizing the cultural relativity of moral reasoning, UniMoral spans six languages, Arabic, Chinese, English, Hindi, Russian, and Spanish, capturing diverse socio-cultural contexts. We demonstrate UniMoral's utility through a benchmark evaluations of three large language models (LLMs) across four tasks: action prediction, moral typology classification, factor attribution analysis, and consequence generation. Key findings reveal that while implicitly embedded moral contexts enhance the moral reasoning capability of LLMs, there remains a critical need for increasingly specialized approaches to further advance moral reasoning in these models.