Bandit on the Hunt: Dynamic Crawling for Cyber Threat Intelligence

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of dynamically identifying high-value, unstructured cyber threat intelligence (CTI) sources—such as news articles and blogs—this paper proposes the first active CTI crawler framework grounded in the Multi-Armed Bandit (MAB) paradigm. The method integrates SBERT-based semantic matching, adaptive crawling policies, and an online reward feedback mechanism to enable automatic seed-source expansion and unsupervised discovery of highly relevant, previously unknown pages or domains—thereby transcending conventional fixed-source extraction approaches. Experimental results demonstrate a 25.3% harvest rate, over 300% growth in seed-source scale, strong topical coherence, and successful identification of numerous high-quality, emerging CTI sources absent from existing CTI ecosystems.

Technology Category

Application Category

📝 Abstract
Public information contains valuable Cyber Threat Intelligence (CTI) that is used to prevent future attacks. While standards exist for sharing this information, much appears in non-standardized news articles or blogs. Monitoring online sources for threats is time-consuming and source selection is uncertain. Current research focuses on extracting Indicators of Compromise from known sources, rarely addressing new source identification. This paper proposes a CTI-focused crawler using multi-armed bandit (MAB) and various crawling strategies. It employs SBERT to identify relevant documents while dynamically adapting its crawling path. Our system ThreatCrawl achieves a harvest rate exceeding 25% and expands its seed by over 300% while maintaining topical focus. Additionally, the crawler identifies previously unknown but highly relevant overview pages, datasets, and domains.
Problem

Research questions and friction points this paper is trying to address.

Identifying new CTI sources beyond standardized formats
Automating dynamic crawling for efficient threat monitoring
Enhancing CTI harvest rate and source discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multi-armed bandit for dynamic crawling
Employs SBERT to identify relevant documents
Dynamically adapts crawling path for efficiency
🔎 Similar Papers
No similar papers found.
P
Philip D. . Kuehn
Science and Technology for Peace and Security (PEASEC), Technical University of Darmstadt, Germany
D
Dilara Nadermahmoodi
Science and Technology for Peace and Security (PEASEC), Technical University of Darmstadt, Germany
Markus Bayer
Markus Bayer
PEASEC, TU Darmstadt
Machine Learning
Christian Reuter
Christian Reuter
Science and Technology for Peace and Security (PEASEC), TU Darmstadt
HCIPeace and Conflict StudiesUsable Security and PrivacyCrisis InformaticsInformation Warfare