Elevating Cyber Threat Intelligence against Disinformation Campaigns with LLM-based Concept Extraction and the FakeCTI Dataset

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional CTI approaches rely on volatile low-level indicators (e.g., domains, IPs), rendering them insufficiently robust for attributing disinformation campaigns amid rapid infrastructure churn and limiting cross-platform adaptability. To address this, we propose a novel CTI framework specifically designed for disinformation analysis: first, replacing infrastructure-centric indicators with concept-level semantic structures—such as narrative patterns and entity relationships—as stable, interpretable CTI primitives; second, introducing FakeCTI, the first publicly available dataset linking fake news instances, disinformation campaigns, and threat actors; third, integrating fine-tuned large language models (LLMs) with classical NLP techniques to enable semantic-driven concept extraction, contextual modeling, and multi-source attribution. Experiments demonstrate substantial improvements in indicator persistence, cross-platform transferability, and spatiotemporal attribution capability—advancing CTI from ephemeral infrastructure traces to durable, semantics-grounded intelligence.

Technology Category

Application Category

📝 Abstract

The swift spread of fake news and disinformation campaigns poses a significant threat to public trust, political stability, and cybersecurity. Traditional Cyber Threat Intelligence (CTI) approaches, which rely on low-level indicators such as domain names and social media handles, are easily evaded by adversaries who frequently modify their online infrastructure. To address these limitations, we introduce a novel CTI framework that focuses on high-level, semantic indicators derived from recurrent narratives and relationships of disinformation campaigns. Our approach extracts structured CTI indicators from unstructured disinformation content, capturing key entities and their contextual dependencies within fake news using Large Language Models (LLMs). We further introduce FakeCTI, the first dataset that systematically links fake news to disinformation campaigns and threat actors. To evaluate the effectiveness of our CTI framework, we analyze multiple fake news attribution techniques, spanning from traditional Natural Language Processing (NLP) to fine-tuned LLMs. This work shifts the focus from low-level artifacts to persistent conceptual structures, establishing a scalable and adaptive approach to tracking and countering disinformation campaigns.

Problem

Research questions and friction points this paper is trying to address.

Detecting disinformation campaigns using high-level semantic indicators

Extracting structured CTI from unstructured fake news with LLMs

Linking fake news to campaigns via the FakeCTI dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs to extract semantic CTI indicators

Introduces FakeCTI dataset for disinformation tracking

Evaluates NLP and fine-tuned LLMs for attribution

🔎 Similar Papers

No similar papers found.

Authors to Follow