ThreatCluster: Threat Clustering for Information Overload Reduction in Computer Emergency Response Teams

📅 2022-10-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address information overload faced by CERT teams due to heterogeneous threat intelligence sources (e.g., vulnerability reports, news articles, threat advisories), this paper proposes the first evaluable clustering framework specifically designed for threat intelligence. Methodologically, it integrates hybrid text embeddings—including TF-IDF and Sentence-BERT—within a unified pipeline combining hierarchical clustering and DBSCAN, and introduces a homogeneity-driven cluster compression mechanism. The framework is rigorously evaluated on a curated threat-report benchmark and two public security corpora. Key contributions include: (1) the first formal definition of evaluation criteria for threat intelligence clustering; (2) an 84.8% reduction in redundant information; and (3) lightweight deployability, significantly decreasing analyst workload while improving incident response timeliness.
📝 Abstract
The ever-increasing number of threats and the existing diversity of information sources pose challenges for Computer Emergency Response Teams (CERTs). To respond to emerging threats, CERTs must gather information in a timely and comprehensive manner. But the volume of sources and information leads to information overload. This paper contributes to the question of how to reduce information overload for CERTs. We propose clustering incoming information as scanning this information is one of the most tiresome, but necessary, manual steps. Based on current studies, we establish conditions for such a framework. Different types of evaluation metrics are used and selected in relation to the framework conditions. Furthermore, different document embeddings and distance measures are evaluated and interpreted in combination with clustering methods. We use three different corpora for the evaluation, a novel ground truth corpus based on threat reports, one security bug report (SBR) corpus, and one with news articles. Our work shows, it is possible to reduce the information overload by up to 84.8% with homogeneous clusters. A runtime analysis of the clustering methods strengthens the decision of selected clustering methods.
Problem

Research questions and friction points this paper is trying to address.

CERTs
Threat Information Overload
Efficient Processing Methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

ThreatCluster
Similarity Measurement
Information Processing Efficiency
P
Philip D. Kuehn
Science and Technology for Peace and Security (PEASEC), Technical University of Darmstadt, Darmstadt, Germany
D
Dilara Nadermahmoodi
Science and Technology for Peace and Security (PEASEC), Technical University of Darmstadt, Darmstadt, Germany
M
Moritz Kerk
Science and Technology for Peace and Security (PEASEC), Technical University of Darmstadt, Darmstadt, Germany
Christian Reuter
Christian Reuter
Science and Technology for Peace and Security (PEASEC), TU Darmstadt
HCIPeace and Conflict StudiesUsable Security and PrivacyCrisis InformaticsInformation Warfare