When Prompting Fails to Sway: Inertia in Moral and Value Judgments of Large Language Models

📅 2024-08-16

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

This work identifies a “value inertia” phenomenon in large language models (LLMs) under persona-based prompting: despite diverse role assignments, models exhibit highly stable (>87%) unidirectional biases across core moral dimensions—particularly harm avoidance and fairness—contradicting the expected plasticity of moral stance. To investigate, we propose a novel paradigm integrating macro-trend analysis with large-scale persona enactment, grounded in Moral Foundations Theory (MFT) to quantify multi-dimensional value orientations. Statistical significance testing across mainstream LLMs confirms robust, cross-model value preference inertia. This study is the first systematic empirical challenge to the efficacy of persona prompting for value alignment. It establishes a methodological foundation and empirical evidence for evaluating LLM fairness and enabling controllable, value-guided behavior.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) exhibit non-deterministic behavior, and prompting has emerged as a primary method for steering their outputs toward desired directions. One popular strategy involves assigning a specific"persona"to the model to induce more varied and context-sensitive responses, akin to the diversity found in human perspectives. However, contrary to the expectation that persona-based prompting would yield a wide range of opinions, our experiments demonstrate that LLMs maintain consistent value orientations. In particular, we observe a persistent inertia in their responses, where certain moral and value dimensions, especially harm avoidance and fairness, remain distinctly skewed in one direction despite varied persona settings. To investigate this phenomenon systematically, use role-play at scale, which combines randomized, diverse persona prompts with a macroscopic trend analysis of model outputs. Our findings highlight the strong internal biases and value preferences in LLMs, underscoring the need for careful scrutiny and potential adjustment of these models to ensure balanced and equitable applications.

Problem

Research questions and friction points this paper is trying to address.

LLMs show consistent moral values despite varied persona prompts

Inertia in LLM responses skews harm avoidance and fairness

Internal biases in LLMs need scrutiny for equitable applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses role-play at scale for analysis

Combines randomized diverse persona prompts

Applies macroscopic trend analysis technique

🔎 Similar Papers

A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges

2024-09-20AI & SOCIETYCitations: 0

Authors to Follow