🤖 AI Summary
This study addresses a previously unexplored dimension of LLM fairness—implicit attributional bias across demographic groups, i.e., whether models systematically attribute outcomes to internal factors (e.g., ability, effort) versus external factors (e.g., luck, environment), reflecting deeper cognitive biases beyond surface-level stereotypes.
Method: We introduce attribution theory from social psychology into LLM evaluation, designing structured, theory-grounded prompting templates for controlled causal reasoning. We then conduct cross-group comparative analysis of attribution distributions.
Contribution/Results: Empirical evaluation across multiple state-of-the-art LLMs reveals significant inter-group attribution imbalance—for instance, heightened internal attributions for minority ethnic groups. These findings expose structural, cognition-level fairness violations not captured by conventional fairness metrics. Our work establishes the first theoretical framework for attributional fairness in LLMs and provides an interpretable, theory-driven paradigm for detecting and diagnosing attribution bias.
📝 Abstract
When a student fails an exam, do we tend to blame their effort or the test's difficulty? Attribution, defined as how reasons are assigned to event outcomes, shapes perceptions, reinforces stereotypes, and influences decisions. Attribution Theory in social psychology explains how humans assign responsibility for events using implicit cognition, attributing causes to internal (e.g., effort, ability) or external (e.g., task difficulty, luck) factors. LLMs' attribution of event outcomes based on demographics carries important fairness implications. Most works exploring social biases in LLMs focus on surface-level associations or isolated stereotypes. This work proposes a cognitively grounded bias evaluation framework to identify how models' reasoning disparities channelize biases toward demographic groups.