From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses the high subjectivity of hate speech detection arising from intergroup judgment disparities, a challenge exacerbated by the prohibitive cost of acquiring multi-group human annotations. To tackle this, the authors propose “vicarious prompting,” a method that leverages role-conditioned large language models (e.g., Llama 3.1) to simulate perspectives across diverse demographic groups. The work presents the first systematic evaluation of such models in capturing group-based disagreement, in-group sensitivity, and cross-group prediction accuracy. Experimental results demonstrate that vicarious prompting significantly improves cross-group consistency across most demographic dimensions, yielding model outputs that better align with human judgment patterns. These findings underscore the potential of large language models to emulate socially diverse viewpoints, while also revealing that their performance remains substantially influenced by model architecture and prompt design choices.

📝 Abstract

Hate speech detection is inherently subjective: people from different demographic groups perceive the same content very differently. Collecting enough annotations from multiple demographic groups is costly and difficult to scale. Persona-conditioned Large Language Models (models prompted to adopt a specific demographic identity) have been proposed as a way to simulate diverse perspectives at scale. But do they actually reflect how different groups disagree? We evaluate three aspects of human social judgement: (i) whether personas from different groups disagree in human-like ways (inter-group disagreement), (ii) whether they become more sensitive when content targets their own identity (in-group sensitivity), and (iii) whether they can accurately predict how another group would react (vicarious prediction). Our results show that no model consistently captures all three dimensions, and performance is highly model-dependent and does not emerge reliably from minimal identity prompts alone. However, vicarious prompting with Llama 3.1 yields the highest cross-group agreement in most demographic axes and provides the closest overall approximation to human disagreement patterns, indicating that this configuration may provide a more reliable setting for automatic annotation aligned with human judgements.

Problem

Research questions and friction points this paper is trying to address.

hate speech detection

demographic perspective-taking

inter-group disagreement

in-group sensitivity

vicarious prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

persona-conditioned LLMs

demographic perspective-taking

hate speech annotation