CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This study addresses the limitations of traditional log analysis methods, which rely heavily on manual rules and feature engineering and struggle to achieve semantic understanding and automated interpretation of system logs and security alerts. To bridge this gap, the authors construct CAM-LDS, the first publicly available, annotated log dataset encompassing 81 attack techniques across 13 tactics, collected from 18 distinct log sources within an open-source, reproducible environment that systematically captures the log manifestations of adversarial behaviors. Building upon this foundation, the work proposes a log interpretation approach leveraging large language models (LLMs). In case studies, the LLM accurately predicted the corresponding attack technique for approximately one-third of the steps and provided reasonable predictions for another third, demonstrating the effectiveness and potential of combining CAM-LDS with LLMs for semantic log understanding.

Technology Category

Application Category

📝 Abstract

Log data are essential for intrusion detection and forensic investigations. However, manual log analysis is tedious due to high data volumes, heterogeneous event formats, and unstructured messages. Even though many automated methods for log analysis exist, they usually still rely on domain-specific configurations such as expert-defined detection rules, handcrafted log parsers, or manual feature-engineering. Crucially, the level of automation of conventional methods is limited due to their inability to semantically understand logs and explain their underlying causes. In contrast, Large Language Models enable domain- and format-agnostic interpretation of system logs and security alerts. Unfortunately, research on this topic remains challenging, because publicly available and labeled data sets covering a broad range of attack techniques are scarce. To address this gap, we introduce the Cyber Attack Manifestation Log Data Set (CAM-LDS), comprising seven attack scenarios that cover 81 distinct techniques across 13 tactics and collected from 18 distinct sources within a fully open-source and reproducible test environment. We extract log events that directly result from attack executions to facilitate analysis of manifestations concerning command observability, event frequencies, performance metrics, and intrusion detection alerts. We further present an illustrative case study utilizing an LLM to process the CAM-LDS. The results indicate that correct attack techniques are predicted perfectly for approximately one third of attack steps and adequately for another third, highlighting the potential of LLM-based log interpretation and utility of our data set.

Problem

Research questions and friction points this paper is trying to address.

log analysis

cyber attack

semantic understanding

labeled dataset

intrusion detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Log Interpretation

Cyber Attack Manifestation