ConspirED: A Dataset for Cognitive Traits of Conspiracy Theories and Large Language Model Safety

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research lacks fine-grained cognitive modeling of large language models’ (LLMs) reasoning robustness and alignment risks when exposed to conspiracy-theoretic content. Method: We introduce ConspirED—the first multi-sentence annotated dataset explicitly designed for conspiracy-theoretic thinking patterns—and propose the CONSPIR cognitive framework, which systematically characterizes hallmark rhetorical devices and inferential biases in conspiracy reasoning. Leveraging this framework, we conduct human annotation and computational modeling to enable automated identification of conspiracy cognition features and evaluate the safety of mainstream LLMs and language reasoning models (LRMs) in responding to conspiracy-laden inputs. Contribution/Results: Our trained detection model achieves strong discriminative performance; however, experiments reveal pervasive reasoning drift in current LLMs—manifesting as reproduction of conspiracy logic and reinforcement of spurious causal attribution. This work establishes a novel, scalable dimension for AI safety evaluation grounded in cognitive modeling of ideologically motivated reasoning.

Technology Category

Application Category

📝 Abstract
Conspiracy theories erode public trust in science and institutions while resisting debunking by evolving and absorbing counter-evidence. As AI-generated misinformation becomes increasingly sophisticated, understanding rhetorical patterns in conspiratorial content is important for developing interventions such as targeted prebunking and assessing AI vulnerabilities. We introduce ConspirED (CONSPIR Evaluation Dataset), which captures the cognitive traits of conspiratorial ideation in multi-sentence excerpts (80--120 words) from online conspiracy articles, annotated using the CONSPIR cognitive framework (Lewandowsky and Cook, 2020). ConspirED is the first dataset of conspiratorial content annotated for general cognitive traits. Using ConspirED, we (i) develop computational models that identify conspiratorial traits and determine dominant traits in text excerpts, and (ii) evaluate large language/reasoning model (LLM/LRM) robustness to conspiratorial inputs. We find that both are misaligned by conspiratorial content, producing output that mirrors input reasoning patterns, even when successfully deflecting comparable fact-checked misinformation.
Problem

Research questions and friction points this paper is trying to address.

Identifying cognitive traits in conspiracy theory content
Developing computational models to detect conspiratorial reasoning patterns
Evaluating large language model vulnerabilities to conspiratorial inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset annotated with cognitive traits framework
Computational models identify conspiratorial reasoning patterns
Evaluate LLM robustness against conspiratorial content inputs
🔎 Similar Papers
No similar papers found.