Hesperus is Phosphorus: Mapping Threat Actor Naming Taxonomies at Scale

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the critical problem of inconsistent naming of threat actors (TAs) by cyber threat intelligence (CTI) vendors—impeding report integration and cross-source correlation analysis. To tackle this, we propose HiP, a novel method integrating graph-based modeling with multi-source clustering. HiP synthesizes 13,371 CTI reports from 15 sources and harmonizes 17 vendor-specific TA classification schemes, constructing the first large-scale TA name association graph. Leveraging 3,287 distinct TA names and eight canonical mapping relations, HiP uncovers the “alias concentration” phenomenon and identifies its root causes—including ad hoc naming practices, toolchain reuse, and operational overlap—while exposing systemic pitfalls in existing normalization approaches. Beyond enabling automated normalization and evolutionary analysis of proprietary naming systems, HiP quantifies, for the first time, the structural origins of naming inconsistency, revealing that barriers to sensitive data sharing—not technical limitations—are the fundamental obstacle to establishing unified naming standards.

Technology Category

Application Category

📝 Abstract
This paper studies the problem of Threat Actor (TA) naming convention inconsistency across leading Cyber Threat Intelligence (CTI) vendors. The current decentralized and proprietary nomenclature creates confusion and significant obstacles for researchers, including difficulties in integrating and correlating disparate CTI reports and TA profiles. This paper introduces HiP (Hesperus is Phosphorus, a reference to the classic question about the Morning and the Evening Star), a methodology for normalizing, integrating, and clustering TA names presumably corresponding to the same entity. Using HiP, we analyze a large dataset collected from 15 sources and spanning 13,371 CTI reports, 17 vendor taxonomies, 3,287 TA names, and 8 mappings between them. Our analysis of the resulting name graph provides insights on key features of the problem, such as the concentration of aliases on a relatively small subset of TAs, the evolution of this phenomenon over the years, and the factors that could explain TA name proliferation. We also report errors in the mappings and methodological pitfalls that contribute to make certain TA name clusters larger than they should be, including the use of temporary names for activity clusters, the existence of common tools and infrastructure, and overlapping operations. We conclude with a discussion on the inherent difficulties to adopt a TA naming standard, a quest fundamentally hampered by the need to share highly-sensitive telemetry that is private to each CTI vendor.
Problem

Research questions and friction points this paper is trying to address.

Standardizing inconsistent threat actor naming conventions across CTI vendors
Integrating and correlating disparate CTI reports and TA profiles
Analyzing factors behind TA name proliferation and mapping errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalizing threat actor names across vendors
Clustering aliases using graph-based methodology
Analyzing large-scale CTI datasets for naming patterns
🔎 Similar Papers
No similar papers found.
G
Gonzalo Roa
Universidad Carlos III de Madrid, Avenida de la Universidad, 30, Leganés, 28911, Spain
M
Manuel Suarez-Roman
Universidad Carlos III de Madrid, Avenida de la Universidad, 30, Leganés, 28911, Spain
Juan Tapiador
Juan Tapiador
Universidad Carlos III de Madrid
Computer Security