Quantifying and Attributing Polarization to Annotator Groups

📅 2026-01-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the limitations of existing annotation consistency metrics, which fail to capture systematic viewpoint disparities between minority and majority groups, and conventional polarization measures that cannot reliably attribute polarization to specific demographic groups. We propose a novel method that, for the first time, disentangles and quantifies irreducible intrinsic polarization from attributable group-based polarization. By leveraging statistical hypothesis testing, our approach enables interpretable and verifiable attribution analysis while avoiding cancellation effects between opposing viewpoints. We release an efficient open-source Python library and validate our method on four subjective NLP datasets. Results demonstrate that gender and race consistently and significantly explain observed polarization patterns, with greater inter-group distance correlating with stronger divergence; reliable polarization estimates can be obtained with as few as approximately 20 annotators.

📝 Abstract

Current annotation agreement metrics are not well-suited for inter-group analysis, are sensitive to group size imbalances and restricted to single-annotation settings. These restrictions render them insufficient for many subjective tasks such as toxicity and hate-speech detection. For this reason, we introduce a quantifiable metric, paired with a statistical significance test, that attributes polarization to various annotator groups. Our metric enables direct comparisons between heavily imbalanced sociodemographic and ideological subgroups across different datasets and tasks, while also enabling analysis on multi-label settings. We apply this metric to three datasets on hate speech, and one on toxicity detection, discovering that: (1) Polarization is strongly and persistently attributed to annotator race, especially on the hate speech task. (2) Religious annotators do not fundamentally disagree with each other, but do with other annotators, a trend that is gradually diminished and then reversed with irreligious annotators. (3) Less educated annotators are more subjective, while educated ones tend to broadly agree more between themselves. Overall, our results reflect current findings around annotation patterns for various subgroups. Finally, we estimate the minimum number of annotators needed to obtain robust results, and provide an open-source Python library that implements our metric.

Problem

Research questions and friction points this paper is trying to address.

polarization

annotator groups

subjective NLP

hate speech detection

annotation disagreement

Innovation

Methods, ideas, or system contributions that make the work stand out.

polarization attribution

unattributable polarization

annotator groups