Fair Play in the Newsroom: Actor-Based Filtering Gender Discrimination in Text Corpora

📅 2025-08-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses implicit gender bias in news corpora by proposing a participant-level fairness analysis framework that systematically detects and mitigates structural asymmetries in sentiment polarity, syntactic agency, and quotation practices among actors (i.e., agents). Methodologically, it introduces discourse-structure-aware, fine-grained fairness metrics integrating sentiment analysis, active-voice identification, quotation pattern modeling, and exclusion-based corpus resampling. Evaluated on the *taz2024full* corpus—a longitudinal German newspaper dataset spanning 1980–2024—the approach significantly improves cross-lingual gender representational balance. Crucially, it uncovers persistent affective and discursive framing biases even where surface-level statistical parity holds. The project releases open-source analytical tools and comprehensive evaluation reports, establishing a reproducible and scalable paradigm for fair corpus construction.

Technology Category

Application Category

📝 Abstract
Large language models are increasingly shaping digital communication, yet their outputs often reflect structural gender imbalances that originate from their training data. This paper presents an extended actor-level pipeline for detecting and mitigating gender discrimination in large-scale text corpora. Building on prior work in discourse-aware fairness analysis, we introduce new actor-level metrics that capture asymmetries in sentiment, syntactic agency, and quotation styles. The pipeline supports both diagnostic corpus analysis and exclusion-based balancing, enabling the construction of fairer corpora. We apply our approach to the taz2024full corpus of German newspaper articles from 1980 to 2024, demonstrating substantial improvements in gender balance across multiple linguistic dimensions. Our results show that while surface-level asymmetries can be mitigated through filtering and rebalancing, subtler forms of bias persist, particularly in sentiment and framing. We release the tools and reports to support further research in discourse-based fairness auditing and equitable corpus construction.
Problem

Research questions and friction points this paper is trying to address.

Detecting gender discrimination in actor representation within text corpora
Mitigating structural gender inequalities in large-scale newspaper datasets
Reducing gender asymmetries through discourse-aware filtering and balancing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Actor-level pipeline detects and mitigates gender discrimination
Combines discourse-aware analysis with sentiment and agency metrics
Enables fine-grained auditing and exclusion-based balancing
🔎 Similar Papers
No similar papers found.
S
Stefanie Urchs
Faculty for Computer Science and Mathematics, Hochschule München University of Applied Sciences
V
Veronika Thurner
Faculty for Computer Science and Mathematics, Hochschule München University of Applied Sciences
Matthias Aßenmacher
Matthias Aßenmacher
Ludwig-Maximilians-Universität München
Natural Language ProcessingStatisticsMachine Learning
Christian Heumann
Christian Heumann
Professor Statistik, Ludwig-Maximilians-Universität München
Statistik
S
Stephanie Thiemichen
Faculty for Computer Science and Mathematics, Hochschule München University of Applied Sciences