🤖 AI Summary
This work exposes a covert fairness and privacy risk: large language models (LLMs) can implicitly infer users’ sensitive demographic attributes—such as gender and race—from question phrasing alone, even without explicit demographic input. To systematically study this phenomenon, the authors introduce the Demographic Attribute Inference from Questions (DAIQ) task and a comprehensive evaluation framework, conducting the first cross-model audit across major open- and closed-source LLMs. Methodologically, they construct a neutral benchmark query set and employ both quantitative metrics and qualitative analysis, empirically demonstrating pervasive demographic inference biases across diverse LLMs. Furthermore, they propose a prompt-engineering–based mitigation strategy that effectively suppresses such inference without modifying model parameters. This work advances understanding of implicit social biases in LLMs and delivers a practical, deployable approach to enhance model fairness.
📝 Abstract
Large Language Models (LLMs) are known to reflect social biases when demographic attributes, such as gender or race, are explicitly present in the input. But even in their absence, these models still infer user identities based solely on question phrasing. This subtle behavior has received far less attention, yet poses serious risks: it violates expectations of neutrality, infers unintended demographic information, and encodes stereotypes that undermine fairness in various domains including healthcare, finance and education.
We introduce Demographic Attribute Inference from Questions (DAIQ), a task and framework for auditing an overlooked failure mode in language models: inferring user demographic attributes from questions that lack explicit demographic cues. Our approach leverages curated neutral queries, systematic prompting, and both quantitative and qualitative analysis to uncover how models infer demographic information. We show that both open and closed source LLMs do assign demographic labels based solely on question phrasing.
Prevalence and consistency of demographic inferences across diverse models reveal a systemic and underacknowledged risk: LLMs can fabricate demographic identities, reinforce societal stereotypes, and propagate harms that erode privacy, fairness, and trust posing a broader threat to social equity and responsible AI deployment. To mitigate this, we develop a prompt-based guardrail that substantially reduces identity inference and helps align model behavior with fairness and privacy objectives.