Reproducibility in Event-Log Research: A Parametrised Generator and Benchmark for Event-based Signatures

📅 2026-01-19

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This study addresses the challenge of evaluating signature-based cybersecurity detection methods due to the scarcity of publicly available, real-world event logs, which are often restricted for privacy and sensitivity reasons. To overcome this limitation, the authors propose a parameterized synthetic event log generation approach that models known attack signatures to produce labeled, configurable log data reflecting realistic scenarios. Complementing this generator, they introduce a benchmarking framework specifically designed for signature-based detection evaluation. This framework establishes, for the first time, a reproducible and comparable testing environment. Experimental results demonstrate that on the generated benchmark datasets, the DBSCAN clustering algorithm achieves an Adjusted Rand Index exceeding 0.95 in most scenarios, thereby validating both the effectiveness and practical utility of the proposed methodology.

Technology Category

Application Category

📝 Abstract

Event-based datasets are crucial for cybersecurity analysis. A key use case is detecting event-based signatures, which represent attacks spanning multiple events and can only be understood once the relevant events are identified and linked. Analysing event datasets is essential for monitoring system security, but their growing volume and frequency create significant scalability and processing difficulties. Researchers rely on these datasets to develop and test techniques for automatically identifying signatures. However, because real datasets are security-sensitive and rarely shared, it becomes difficult to perform meaningful comparative evaluation between different approaches. This work addresses this evaluation limitation by offering a systematic method for generating event logs with known ground truth, enabling reproducible and comparable research. We present a novel parametrised generation technique capable of producing synthetic event datasets that contain event-based signatures for discovery. To demonstrate the capabilities of the technique, we provide a benchmark in signature detection. Our benchmarking demonstrated the suitability of DBSCAN, achieving a score greater than 0.95 Adjusted Rand Index on most generated datasets. This work enhances the ability of researchers to develop and benchmark new cybersecurity techniques, ultimately contributing to more robust and effective cybersecurity measures.

Problem

Research questions and friction points this paper is trying to address.

reproducibility

event-log

cybersecurity

event-based signatures

benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

parametrised generator

event-based signatures

synthetic event logs