Linear socio-demographic representations emerge in Large Language Models from indirect cues

📅 2025-12-10
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) implicitly encode sociodemographic attributes—such as gender, race, and occupation—from indirect cues (e.g., names, job titles), leading to downstream behavioral biases that persist even after passing standard fairness benchmarks. Method: We employ residual stream probing, cross-model analysis (Magistral/Qwen3/GPT-OSS/OLMo2), and census-aligned embedding alignment to investigate the emergence and structure of these implicit representations. Contribution/Results: We demonstrate for the first time that LLMs spontaneously construct linear, interpretable, and robust sociodemographic representations—without explicit supervision or prompting—that closely align with real-world census distributions. These latent representations directly drive biased downstream behaviors (e.g., occupational recommendations). Critically, such implicit bias is structurally grounded, empirically detectable, and behaviorally consequential—challenging prevailing fairness evaluation paradigms that rely on surface-level metrics and fail to capture latent representational skew.

Technology Category

Application Category

📝 Abstract
We investigate how LLMs encode sociodemographic attributes of human conversational partners inferred from indirect cues such as names and occupations. We show that LLMs develop linear representations of user demographics within activation space, wherein stereotypically associated attributes are encoded along interpretable geometric directions. We first probe residual streams across layers of four open transformer-based LLMs (Magistral 24B, Qwen3 14B, GPT-OSS 20B, OLMo2-1B) prompted with explicit demographic disclosure. We show that the same probes predict demographics from implicit cues: names activate census-aligned gender and race representations, while occupations trigger representations correlated with real-world workforce statistics. These linear representations allow us to explain demographic inferences implicitly formed by LLMs during conversation. We demonstrate that these implicit demographic representations actively shape downstream behavior, such as career recommendations. Our study further highlights that models that pass bias benchmark tests may still harbor and leverage implicit biases, with implications for fairness when applied at scale.
Problem

Research questions and friction points this paper is trying to address.

LLMs encode sociodemographic attributes from indirect cues like names and occupations
Linear representations of demographics in activation space shape downstream behavior
Models passing bias tests may still harbor and leverage implicit biases
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs develop linear demographic representations from indirect cues
Probes predict demographics from names and occupations in activation space
Implicit demographic representations actively shape downstream model behavior
🔎 Similar Papers
No similar papers found.