FC-CONAN: An Exhaustively Paired Dataset for Robust Evaluation of Retrieval Systems

📅 2026-01-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing datasets pairing hate speech with counter-narratives suffer from sparse annotations, making reliable evaluation of retrieval systems challenging. To address this limitation, this work introduces FC-CONAN, the first exhaustively annotated dataset comprising all possible combinations of 45 hate speech instances and 129 counter-narratives—yielding 5,805 pairs. Through a rigorous two-stage annotation process involving nine annotators and four validators, the authors construct four reliability-tiered subsets: Diamond, Gold, Silver, and Bronze. This exhaustive labeling uncovers hundreds of previously missed valid positive pairs, substantially enhancing the fidelity of system evaluation. Furthermore, FC-CONAN provides a high-quality, non-overlapping annotation resource that enables fine-grained error analysis and supports future research in counter-narrative retrieval.

Technology Category

Application Category

📝 Abstract

Hate speech (HS) is a critical issue in online discourse, and one promising strategy to counter it is through the use of counter-narratives (CNs). Datasets linking HS with CNs are essential for advancing counterspeech research. However, even flagship resources like CONAN (Chung et al., 2019) annotate only a sparse subset of all possible HS-CN pairs, limiting evaluation. We introduce FC-CONAN (Fully Connected CONAN), the first dataset created by exhaustively considering all combinations of 45 English HS messages and 129 CNs. A two-stage annotation process involving nine annotators and four validators produces four partitions-Diamond, Gold, Silver, and Bronze-that balance reliability and scale. None of the labeled pairs overlap with CONAN, uncovering hundreds of previously unlabelled positives. FC-CONAN enables more faithful evaluation of counterspeech retrieval systems and facilitates detailed error analysis. The dataset is publicly available.

Problem

Research questions and friction points this paper is trying to address.

hate speech

counter-narratives

retrieval evaluation

paired dataset

counterspeech

Innovation

Methods, ideas, or system contributions that make the work stand out.

exhaustively paired dataset

counter-narratives

hate speech