🤖 AI Summary
Large language models (LLMs) lack intrinsic capacity for moral reasoning and struggle to align with human ethical decision-making requirements.
Method: This paper proposes an architecture-level moral embedding framework—introducing Iris Murdoch’s philosophical concept of “loving attention” into AI for the first time—and designs a dynamic moral attention mechanism. It integrates moral sensitivity into the Transformer’s底层 architecture via three innovations: (1) a morally augmented training objective, (2) runtime adaptive weight adjustment, and (3) attention structure reconfiguration. Unlike external alignment or post-hoc correction methods, this approach endows LLMs with native moral perception and transformation capabilities at the architectural level.
Contribution/Results: Empirical evaluation confirms technical feasibility; the work establishes the first operational framework unifying ethical philosophy with neural architecture design. It advances AI ethics from a constraint-based paradigm toward a generative one, offering a novel interdisciplinary pathway for developing morally sensitive LLMs.
📝 Abstract
Large language models (LLMs) increasingly mediate human decision-making and behaviour. Ensuring LLM processing of moral meaning therefore has become a critical challenge. Current approaches rely predominantly on bottom-up methods such as fine-tuning and reinforcement learning from human feedback. We propose a fundamentally different approach: embedding moral meaning processing directly into the architectural mechanisms and frameworks of transformer-based models through top-down design principles. We first sketch a framework that conceptualizes attention as a dynamic interface mediating between structure and processing, contrasting with existing linear attention frameworks in psychology. We start from established biological-artificial attention analogies in neural architecture design to improve cognitive processing. We extend this analysis to moral processing, using Iris Murdoch's theory of loving attention (sustained, just observation that enables moral transformation by reseeing others with clarity and compassion) to philosophically discuss functional analogies between human and LLM moral processing. We formulate and evaluate potentially promising technical operationalizations to embed morality in LLM architectures and frameworks. We acknowledge the limitations of our exploration and give three key contributions. (1) We conceptualize attention as a dynamic system mechanism mediating between structure and processing. (2) Drawing on the Murdoch notion of loving attention, we outline technical pathways for embedding morality in LLMs, through modified training objectives, runtime weight adjustments, and architectural refinements to attention. (3) We argue that integrating morality into architectures and frameworks complements external, constraint-based methods. We conclude with a call for collaboration between transformer designers and philosophers engaged in AI ethics.