Do Gender Cues Affect LLM Value Trade-offs? Evidence from a Controlled Decision Benchmark

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) exhibit gender-biased decision-making in value-sensitive contexts due to irrelevant gender cues. To this end, the authors introduce the Realistic Value Decision Benchmark (RVDB), a high-control evaluation framework that systematically varies only character gender while strictly holding constant scenarios, roles, and value pairs. Using a position-balanced design, they conduct gender perturbation experiments across seven prominent LLMs and perform fine-grained analyses incorporating value distance and decision severity. The findings reveal that, despite limited overall effects, gender cues induce systematic decision reversals—particularly in ambiguous value boundaries or high-severity situations—with an asymmetric impact favoring female characters. Notably, models frequently misattribute their choices as gender-neutral, underscoring the necessity of behavioral auditing over reliance on self-reported explanations.

📝 Abstract

Large language models are increasingly used in value-sensitive decision settings, where irrelevant demographic cues should not alter judgments. We construct the Realistic Value Decision Benchmark (RVDB), a controlled benchmark that varies only the role-gender configuration while holding the scenario, ordered value pair, roles, candidate decisions, Value Distance, and Decision Severity fixed. Using a position-balanced evaluation across seven models, we test whether models preserve decision invariance under gender perturbations and whether their self-attributions reflect observed behavioral changes. We find that explicit gender cues induce bounded but systematic decision flips, including under an explicit gender-attribution prompt that asks models to report whether gender influenced their choice. Cross-gender role swaps reveal a consistent female-proposed-decision asymmetry, while models often attribute flipped decisions to No Influence or other non-gender factors. Further analysis shows that gender effects concentrate near less determinate value boundaries and under more severe decision contexts, suggesting that gender cues act as local boundary-shifting factors rather than global overrides of value reasoning. Value rankings remain largely stable, but ordered value-pair trade-offs shift unevenly across role-gender configurations. These results show that gender can enter LLM value trade-offs behaviorally while remaining obscured in self-attribution, motivating controlled behavioral audits beyond explanation-based evaluation.

Problem

Research questions and friction points this paper is trying to address.

gender bias

value trade-offs

large language models

decision invariance

behavioral audit

Innovation

Methods, ideas, or system contributions that make the work stand out.

gender bias

value trade-offs

controlled benchmark