FC-CONAN: An Exhaustively Paired Dataset for Robust Evaluation of Retrieval Systems

📅 2026-01-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing datasets pairing hate speech with counter-narratives suffer from sparse annotations, making reliable evaluation of retrieval systems challenging. To address this limitation, this work introduces FC-CONAN, the first exhaustively annotated dataset comprising all possible combinations of 45 hate speech instances and 129 counter-narratives—yielding 5,805 pairs. Through a rigorous two-stage annotation process involving nine annotators and four validators, the authors construct four reliability-tiered subsets: Diamond, Gold, Silver, and Bronze. This exhaustive labeling uncovers hundreds of previously missed valid positive pairs, substantially enhancing the fidelity of system evaluation. Furthermore, FC-CONAN provides a high-quality, non-overlapping annotation resource that enables fine-grained error analysis and supports future research in counter-narrative retrieval.

Technology Category

Application Category

📝 Abstract
Hate speech (HS) is a critical issue in online discourse, and one promising strategy to counter it is through the use of counter-narratives (CNs). Datasets linking HS with CNs are essential for advancing counterspeech research. However, even flagship resources like CONAN (Chung et al., 2019) annotate only a sparse subset of all possible HS-CN pairs, limiting evaluation. We introduce FC-CONAN (Fully Connected CONAN), the first dataset created by exhaustively considering all combinations of 45 English HS messages and 129 CNs. A two-stage annotation process involving nine annotators and four validators produces four partitions-Diamond, Gold, Silver, and Bronze-that balance reliability and scale. None of the labeled pairs overlap with CONAN, uncovering hundreds of previously unlabelled positives. FC-CONAN enables more faithful evaluation of counterspeech retrieval systems and facilitates detailed error analysis. The dataset is publicly available.
Problem

Research questions and friction points this paper is trying to address.

hate speech
counter-narratives
retrieval evaluation
paired dataset
counterspeech
Innovation

Methods, ideas, or system contributions that make the work stand out.

exhaustively paired dataset
counter-narratives
hate speech
retrieval evaluation
two-stage annotation
🔎 Similar Papers
No similar papers found.
J
Juan Junqueras
Universidad de Buenos Aires, FCEyN, Departamento de Computación, Buenos Aires, Argentina
Florian Boudin
Florian Boudin
Associate Professor, LS2N - Nantes Université and JFLI - National Institute of Informatics / Tokyo
Natural Language ProcessingInformation RetrievalComputational Linguistics
M
M. Zin
Center for Juris-Informatics, ROIS-DS, Tokyo, Japan
H
Ha-Thanh Nguyen
Center for Juris-Informatics, ROIS-DS, Tokyo, Japan
W
Wachara Fungwacharakorn
Center for Juris-Informatics, ROIS-DS, Tokyo, Japan
D
D. Furman
Universidad de Buenos Aires, FCEyN, Departamento de Computación, Buenos Aires, Argentina
A
Akiko Aizawa
National Institute of Informatics (NII), Tokyo, Japan
Ken Satoh
Ken Satoh
National Institute of Informatics
Artificial Intelligence