Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the implicit racialized narrative mechanisms underlying LLaMA 3.2-3B’s generation of short stories featuring Black and white Portuguese women, addressing how ostensibly neutral large language models (LLMs) reproduce colonial gender–race frameworks. Method: We propose a mixed-methods approach integrating computational semantic clustering (applied to 2,100 generated texts) with critical discourse analysis—bridging limitations of purely quantitative or qualitative paradigms. Contribution/Results: We identify three dominant discursive patterns—“social transcendence,” “ancestral mythologization,” and “subjective self-actualization”—which systematically differentiate the model’s construction of agency and historical positioning for Black versus white women. Empirical findings confirm that, even in the absence of explicit bias prompts, the model reproduces structural inequities. The study delivers a transferable methodological framework and empirically grounded benchmark for AI fairness evaluation.

Technology Category

Application Category

📝 Abstract
This study investigates how large language models, in particular LLaMA 3.2-3B, construct narratives about Black and white women in short stories generated in Portuguese. From 2100 texts, we applied computational methods to group semantically similar stories, allowing a selection for qualitative analysis. Three main discursive representations emerge: social overcoming, ancestral mythification and subjective self-realization. The analysis uncovers how grammatically coherent, seemingly neutral texts materialize a crystallized, colonially structured framing of the female body, reinforcing historical inequalities. The study proposes an integrated approach, that combines machine learning techniques with qualitative, manual discourse analysis.
Problem

Research questions and friction points this paper is trying to address.

Investigates racial biases in LLM-generated stories
Analyzes narrative construction about Black and white women
Examines how neutral texts reinforce historical inequalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used computational methods to group stories
Combined machine learning with qualitative analysis
Analyzed racial biases in generated narratives
🔎 Similar Papers
No similar papers found.
Gustavo Bonil
Gustavo Bonil
Universidade Estadual de Campinas
J
João Gondim
Instituto de Computação, Universidade Estadual de Campinas (UNICAMP), Campinas – SP – Brasil
M
Marina dos Santos
Instituto de Estudos da Linguagem, Universidade Estadual de Campinas (UNICAMP), Campinas – SP – Brasil
S
Simone Hashiguti
Instituto de Estudos da Linguagem, Universidade Estadual de Campinas (UNICAMP), Campinas – SP – Brasil
Helena Maia
Helena Maia
University of Campinas
computer visionmachine learningimage processing
N
Nadia Silva
Instituto de Informática, Universidade Federal de Goiás (UFG), Goiânia – GO – Brasil
H
Helio Pedrini
Instituto de Computação, Universidade Estadual de Campinas (UNICAMP), Campinas – SP – Brasil
Sandra Avila
Sandra Avila
Professor of Computer Science, University of Campinas (Unicamp)
Machine LearningDeep LearningComputer VisionNatural Language Processing