Beating Harmful Stereotypes Through Facts: RAG-based Counter-speech Generation

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing counter-speech generation methods predominantly rely on pure text generation—via large language models (LLMs) or expert authoring—suffering from weak factual grounding, poor logical coherence, and limited scalability. To address these limitations, this paper introduces the first trustworthy counter-speech generation framework tailored to eight marginalized groups (e.g., women, racial/ethnic minorities, persons with disabilities). Our key innovation lies in integrating retrieval-augmented generation (RAG) into counter-speech synthesis: we construct a domain-specific knowledge base comprising 32,792 authoritative documents sourced from the United Nations Digital Library, EUR-Lex, and the European Union Agency for Fundamental Rights, enabling fact-grounded and controllable generation. Evaluated on the MultiTarget-CONAN benchmark, our approach substantially outperforms mainstream LLM baselines. Both automated metrics and human evaluation confirm significant improvements in factual accuracy, logical coherence, and target-group specificity.

Technology Category

Application Category

📝 Abstract

Counter-speech generation is at the core of many expert activities, such as fact-checking and hate speech, to counter harmful content. Yet, existing work treats counter-speech generation as pure text generation task, mainly based on Large Language Models or NGO experts. These approaches show severe drawbacks due to the limited reliability and coherence in the generated countering text, and in scalability, respectively. To close this gap, we introduce a novel framework to model counter-speech generation as knowledge-wise text generation process. Our framework integrates advanced Retrieval-Augmented Generation (RAG) pipelines to ensure the generation of trustworthy counter-speech for 8 main target groups identified in the hate speech literature, including women, people of colour, persons with disabilities, migrants, Muslims, Jews, LGBT persons, and other. We built a knowledge base over the United Nations Digital Library, EUR-Lex and the EU Agency for Fundamental Rights, comprising a total of 32,792 texts. We use the MultiTarget-CONAN dataset to empirically assess the quality of the generated counter-speech, both through standard metrics (i.e., JudgeLM) and a human evaluation. Results show that our framework outperforms standard LLM baselines and competitive approach, on both assessments. The resulting framework and the knowledge base pave the way for studying trustworthy and sound counter-speech generation, in hate speech and beyond.

Problem

Research questions and friction points this paper is trying to address.

Generating reliable counter-speech against harmful stereotypes

Addressing limitations in coherence and scalability of existing methods

Ensuring trustworthy responses for eight vulnerable target groups

Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG-based framework for counter-speech generation

Knowledge base with 32,792 texts from UN and EU

Targets eight groups with trustworthy counter-speech

🔎 Similar Papers

Red-Teaming for Inducing Societal Bias in Large Language Models