Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitation of existing bias evaluations, which are typically conducted under static conditions and fail to capture the stability and generalizability of large language models’ biased behaviors across varying contexts—such as time, location, or audience. The authors propose Contextual StereoSet, a benchmark that systematically varies contextual frames (e.g., temporal anchoring to the 1990s, gossip scenarios, outgroup perspectives) while holding stereotypical content constant. They introduce Context Sensitivity Fingerprints (CSF), a method quantifying model sensitivity through dimensional dispersion and pairwise contrast metrics. Using a dual-track evaluation protocol with rigorous statistical corrections—including bootstrap confidence intervals and false discovery rate control—experiments across 13 models reveal that context significantly modulates bias: 1990s anchoring consistently amplifies stereotypical choices, gossip contexts exacerbate bias in five of six models, and outgroup perspectives induce shifts of up to 13 percentage points, with consistent effects observed in high-stakes domains like hiring and lending.

Technology Category

Application Category

📝 Abstract

A model that avoids stereotypes in a lab benchmark may not avoid them in deployment. We show that measured bias shifts dramatically when prompts mention different places, times, or audiences -- no adversarial prompting required. We introduce Contextual StereoSet, a benchmark that holds stereotype content fixed while systematically varying contextual framing. Testing 13 models across two protocols, we find striking patterns: anchoring to 1990 (vs. 2030) raises stereotype selection in all models tested on this contrast (p<0.05); gossip framing raises it in 5 of 6 full-grid models; out-group observer framing shifts it by up to 13 percentage points. These effects replicate in hiring, lending, and help-seeking vignettes. We propose Context Sensitivity Fingerprints (CSF): a compact profile of per-dimension dispersion and paired contrasts with bootstrap CIs and FDR correction. Two evaluation tracks support different use cases -- a 360-context diagnostic grid for deep analysis and a budgeted protocol covering 4,229 items for production screening. The implication is methodological: bias scores from fixed-condition tests may not generalize.This is not a claim about ground-truth bias rates; it is a stress test of evaluation robustness. CSF forces evaluators to ask,"Under what conditions does bias appear?"rather than"Is this model biased?"We release our benchmark, code, and results.

Problem

Research questions and friction points this paper is trying to address.

bias evaluation

contextual robustness

stereotype alignment

large language models

evaluation generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual StereoSet

bias robustness

context sensitivity

stereotype alignment

evaluation methodology

🔎 Similar Papers

No similar papers found.

Authors to Follow