🤖 AI Summary
Existing datasets pairing hate speech with counter-narratives suffer from sparse annotations, making reliable evaluation of retrieval systems challenging. To address this limitation, this work introduces FC-CONAN, the first exhaustively annotated dataset comprising all possible combinations of 45 hate speech instances and 129 counter-narratives—yielding 5,805 pairs. Through a rigorous two-stage annotation process involving nine annotators and four validators, the authors construct four reliability-tiered subsets: Diamond, Gold, Silver, and Bronze. This exhaustive labeling uncovers hundreds of previously missed valid positive pairs, substantially enhancing the fidelity of system evaluation. Furthermore, FC-CONAN provides a high-quality, non-overlapping annotation resource that enables fine-grained error analysis and supports future research in counter-narrative retrieval.
📝 Abstract
Hate speech (HS) is a critical issue in online discourse, and one promising strategy to counter it is through the use of counter-narratives (CNs). Datasets linking HS with CNs are essential for advancing counterspeech research. However, even flagship resources like CONAN (Chung et al., 2019) annotate only a sparse subset of all possible HS-CN pairs, limiting evaluation. We introduce FC-CONAN (Fully Connected CONAN), the first dataset created by exhaustively considering all combinations of 45 English HS messages and 129 CNs. A two-stage annotation process involving nine annotators and four validators produces four partitions-Diamond, Gold, Silver, and Bronze-that balance reliability and scale. None of the labeled pairs overlap with CONAN, uncovering hundreds of previously unlabelled positives. FC-CONAN enables more faithful evaluation of counterspeech retrieval systems and facilitates detailed error analysis. The dataset is publicly available.