Bandit on the Hunt: Dynamic Crawling for Cyber Threat Intelligence

📅 2025-04-25

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

To address the challenge of dynamically identifying high-value, unstructured cyber threat intelligence (CTI) sources—such as news articles and blogs—this paper proposes the first active CTI crawler framework grounded in the Multi-Armed Bandit (MAB) paradigm. The method integrates SBERT-based semantic matching, adaptive crawling policies, and an online reward feedback mechanism to enable automatic seed-source expansion and unsupervised discovery of highly relevant, previously unknown pages or domains—thereby transcending conventional fixed-source extraction approaches. Experimental results demonstrate a 25.3% harvest rate, over 300% growth in seed-source scale, strong topical coherence, and successful identification of numerous high-quality, emerging CTI sources absent from existing CTI ecosystems.

Technology Category

Application Category

📝 Abstract

Public information contains valuable Cyber Threat Intelligence (CTI) that is used to prevent future attacks. While standards exist for sharing this information, much appears in non-standardized news articles or blogs. Monitoring online sources for threats is time-consuming and source selection is uncertain. Current research focuses on extracting Indicators of Compromise from known sources, rarely addressing new source identification. This paper proposes a CTI-focused crawler using multi-armed bandit (MAB) and various crawling strategies. It employs SBERT to identify relevant documents while dynamically adapting its crawling path. Our system ThreatCrawl achieves a harvest rate exceeding 25% and expands its seed by over 300% while maintaining topical focus. Additionally, the crawler identifies previously unknown but highly relevant overview pages, datasets, and domains.

Problem

Research questions and friction points this paper is trying to address.

Identifying new CTI sources beyond standardized formats

Automating dynamic crawling for efficient threat monitoring

Enhancing CTI harvest rate and source discovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multi-armed bandit for dynamic crawling

Employs SBERT to identify relevant documents

Dynamically adapts crawling path for efficiency

🔎 Similar Papers

No similar papers found.