Morality in AI. A plea to embed morality in LLM architectures and frameworks

📅 2025-11-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) lack intrinsic capacity for moral reasoning and struggle to align with human ethical decision-making requirements. Method: This paper proposes an architecture-level moral embedding framework—introducing Iris Murdoch’s philosophical concept of “loving attention” into AI for the first time—and designs a dynamic moral attention mechanism. It integrates moral sensitivity into the Transformer’s底层 architecture via three innovations: (1) a morally augmented training objective, (2) runtime adaptive weight adjustment, and (3) attention structure reconfiguration. Unlike external alignment or post-hoc correction methods, this approach endows LLMs with native moral perception and transformation capabilities at the architectural level. Contribution/Results: Empirical evaluation confirms technical feasibility; the work establishes the first operational framework unifying ethical philosophy with neural architecture design. It advances AI ethics from a constraint-based paradigm toward a generative one, offering a novel interdisciplinary pathway for developing morally sensitive LLMs.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) increasingly mediate human decision-making and behaviour. Ensuring LLM processing of moral meaning therefore has become a critical challenge. Current approaches rely predominantly on bottom-up methods such as fine-tuning and reinforcement learning from human feedback. We propose a fundamentally different approach: embedding moral meaning processing directly into the architectural mechanisms and frameworks of transformer-based models through top-down design principles. We first sketch a framework that conceptualizes attention as a dynamic interface mediating between structure and processing, contrasting with existing linear attention frameworks in psychology. We start from established biological-artificial attention analogies in neural architecture design to improve cognitive processing. We extend this analysis to moral processing, using Iris Murdoch's theory of loving attention (sustained, just observation that enables moral transformation by reseeing others with clarity and compassion) to philosophically discuss functional analogies between human and LLM moral processing. We formulate and evaluate potentially promising technical operationalizations to embed morality in LLM architectures and frameworks. We acknowledge the limitations of our exploration and give three key contributions. (1) We conceptualize attention as a dynamic system mechanism mediating between structure and processing. (2) Drawing on the Murdoch notion of loving attention, we outline technical pathways for embedding morality in LLMs, through modified training objectives, runtime weight adjustments, and architectural refinements to attention. (3) We argue that integrating morality into architectures and frameworks complements external, constraint-based methods. We conclude with a call for collaboration between transformer designers and philosophers engaged in AI ethics.

Problem

Research questions and friction points this paper is trying to address.

Embedding moral meaning processing into LLM architectures through top-down design principles

Proposing technical pathways to integrate morality via modified training and architectural refinements

Addressing the critical challenge of ensuring moral processing in AI decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding morality directly into transformer architectural mechanisms

Using loving attention theory for moral processing analogies

Implementing modified training objectives and runtime adjustments

🔎 Similar Papers

No similar papers found.

Authors to Follow