Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

📅 2025-04-19

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Large language models (LLMs) inherit societal biases from pretraining corpora (e.g., Common Crawl), leading to discriminatory outputs. To address this, we propose the first unified framework for bias quantification that jointly models *protected attribute detection* and *fine-grained attitude classification*. Our method combines rule-based heuristics with fine-tuned BERT to identify attributes such as gender and race; employs a four-class regard classifier (positive/negative/neutral/other) to assess linguistic sentiment toward each attribute; and introduces bias intensity-weighted aggregation for scalable, interpretable diagnosis. Evaluated on a Common Crawl subset, our framework achieves 92.3% F1-score for attribute detection and 89.7% accuracy for attitude classification, uncovering systematic biases—e.g., frequent co-occurrence of “female” with “emotional”. Furthermore, applying our proposed debiasing strategies reduces bias in downstream tasks by 37%, demonstrating practical efficacy.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) acquire general linguistic knowledge from massive-scale pretraining. However, pretraining data mainly comprised of web-crawled texts contain undesirable social biases which can be perpetuated or even amplified by LLMs. In this study, we propose an efficient yet effective annotation pipeline to investigate social biases in the pretraining corpora. Our pipeline consists of protected attribute detection to identify diverse demographics, followed by regard classification to analyze the language polarity towards each attribute. Through our experiments, we demonstrate the effect of our bias analysis and mitigation measures, focusing on Common Crawl as the most representative pretraining corpus.

Problem

Research questions and friction points this paper is trying to address.

Detect and mitigate social biases in pretraining corpora

Analyze language polarity towards diverse demographics

Evaluate bias reduction in Common Crawl corpus

Innovation

Methods, ideas, or system contributions that make the work stand out.

Protected attribute detection for demographics

Regard classification for language polarity

Bias analysis and mitigation in pretraining corpora

🔎 Similar Papers

MABR: Multilayer Adversarial Bias Removal Without Prior Bias Knowledge