Can we Debias Social Stereotypes in AI-Generated Images? Examining Text-to-Image Outputs and User Perceptions

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-image (T2I) models are prone to reproducing and amplifying gender, racial, and cultural stereotypes, posing significant ethical risks. To address this, we propose the first theory-driven framework for social bias assessment in T2I generation, introducing the Social Stereotype Index (SSI) and a multidimensional bias detection scale. Our method innovatively employs LLM-assisted prompt optimization to systematically mitigate biases while preserving semantic fidelity. Through cross-model auditing across DALL-E-3, Midjourney v6.1, and Stability AI Core—complemented by quantitative analysis and user perception studies—we demonstrate that our approach reduces SSI scores by 61%, 69%, and 51% along geographic-cultural, occupational, and adjectival dimensions, respectively. Crucially, our user study uncovers an implicit preference for stereotypical outputs, revealing a fundamental tension between ethical alignment and authentic representation in generative AI.

Technology Category

Application Category

📝 Abstract
Recent advances in generative AI have enabled visual content creation through text-to-image (T2I) generation. However, despite their creative potential, T2I models often replicate and amplify societal stereotypes -- particularly those related to gender, race, and culture -- raising important ethical concerns. This paper proposes a theory-driven bias detection rubric and a Social Stereotype Index (SSI) to systematically evaluate social biases in T2I outputs. We audited three major T2I model outputs -- DALL-E-3, Midjourney-6.1, and Stability AI Core -- using 100 queries across three categories -- geocultural, occupational, and adjectival. Our analysis reveals that initial outputs are prone to include stereotypical visual cues, including gendered professions, cultural markers, and western beauty norms. To address this, we adopted our rubric to conduct targeted prompt refinement using LLMs, which significantly reduced bias -- SSI dropped by 61% for geocultural, 69% for occupational, and 51% for adjectival queries. We complemented our quantitative analysis through a user study examining perceptions, awareness, and preferences around AI-generated biased imagery. Our findings reveal a key tension -- although prompt refinement can mitigate stereotypes, it can limit contextual alignment. Interestingly, users often perceived stereotypical images to be more aligned with their expectations. We discuss the need to balance ethical debiasing with contextual relevance and call for T2I systems that support global diversity and inclusivity while not compromising the reflection of real-world social complexity.
Problem

Research questions and friction points this paper is trying to address.

Detect and measure social stereotypes in AI-generated images
Reduce bias in text-to-image model outputs using prompt refinement
Balance ethical debiasing with user expectations and contextual relevance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed theory-driven bias detection rubric
Created Social Stereotype Index (SSI)
Used LLMs for targeted prompt refinement
🔎 Similar Papers
No similar papers found.