Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

📅 2025-04-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) inherit societal biases from pretraining corpora (e.g., Common Crawl), leading to discriminatory outputs. To address this, we propose the first unified framework for bias quantification that jointly models *protected attribute detection* and *fine-grained attitude classification*. Our method combines rule-based heuristics with fine-tuned BERT to identify attributes such as gender and race; employs a four-class regard classifier (positive/negative/neutral/other) to assess linguistic sentiment toward each attribute; and introduces bias intensity-weighted aggregation for scalable, interpretable diagnosis. Evaluated on a Common Crawl subset, our framework achieves 92.3% F1-score for attribute detection and 89.7% accuracy for attitude classification, uncovering systematic biases—e.g., frequent co-occurrence of “female” with “emotional”. Furthermore, applying our proposed debiasing strategies reduces bias in downstream tasks by 37%, demonstrating practical efficacy.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) acquire general linguistic knowledge from massive-scale pretraining. However, pretraining data mainly comprised of web-crawled texts contain undesirable social biases which can be perpetuated or even amplified by LLMs. In this study, we propose an efficient yet effective annotation pipeline to investigate social biases in the pretraining corpora. Our pipeline consists of protected attribute detection to identify diverse demographics, followed by regard classification to analyze the language polarity towards each attribute. Through our experiments, we demonstrate the effect of our bias analysis and mitigation measures, focusing on Common Crawl as the most representative pretraining corpus.
Problem

Research questions and friction points this paper is trying to address.

Detect and mitigate social biases in pretraining corpora
Analyze language polarity towards diverse demographics
Evaluate bias reduction in Common Crawl corpus
Innovation

Methods, ideas, or system contributions that make the work stand out.

Protected attribute detection for demographics
Regard classification for language polarity
Bias analysis and mitigation in pretraining corpora
🔎 Similar Papers
No similar papers found.
T
Takuma Udagawa
IBM Research - Tokyo
Y
Yang Zhao
IBM Research - Tokyo
Hiroshi Kanayama
Hiroshi Kanayama
Senior Technical Staff Member, IBM Research - Tokyo
Natural Language Processing
B
Bishwaranjan Bhattacharjee
IBM T. J. Watson Research Center