Identifying Bias in Machine-generated Text Detection

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This study systematically evaluates the fairness of 16 mainstream English AI text detectors across four sociodemographic attributes—gender, race/ethnicity, English Language Learner (ELL) status, and socioeconomic status. Method: Using a student writing dataset, we employ regression modeling, subgroup difference testing, human annotation comparison, and a multi-model bias auditing framework. Contribution/Results: We uncover previously undocumented systematic disparities: essays by ELL and non-White students are significantly over-classified as AI-generated, whereas those by economically disadvantaged students are disproportionately misclassified as human-written. In contrast, human annotators—though less accurate overall—exhibit no statistically significant attribute-based bias. These findings reveal severe, asymmetric societal biases embedded in current detectors, indicating an urgent need for algorithmic fairness interventions at the model design and deployment levels.

Technology Category

Application Category

📝 Abstract
The meteoric rise in text generation capability has been accompanied by parallel growth in interest in machine-generated text detection: the capability to identify whether a given text was generated using a model or written by a person. While detection models show strong performance, they have the capacity to cause significant negative impacts. We explore potential biases in English machine-generated text detection systems. We curate a dataset of student essays and assess 16 different detection systems for bias across four attributes: gender, race/ethnicity, English-language learner (ELL) status, and economic status. We evaluate these attributes using regression-based models to determine the significance and power of the effects, as well as performing subgroup analysis. We find that while biases are generally inconsistent across systems, there are several key issues: several models tend to classify disadvantaged groups as machine-generated, ELL essays are more likely to be classified as machine-generated, economically disadvantaged students' essays are less likely to be classified as machine-generated, and non-White ELL essays are disproportionately classified as machine-generated relative to their White counterparts. Finally, we perform human annotation and find that while humans perform generally poorly at the detection task, they show no significant biases on the studied attributes.
Problem

Research questions and friction points this paper is trying to address.

Investigates bias in machine-generated text detection systems
Evaluates detection models across gender, race, ELL, and economic status
Finds disadvantaged groups disproportionately flagged as machine-generated
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated 16 detection systems for bias using regression models
Analyzed bias across gender, race, ELL status, and economic attributes
Found disadvantaged groups often misclassified as machine-generated text
🔎 Similar Papers
2024-06-21Journal of Artificial Intelligence ResearchCitations: 6