🤖 AI Summary
To address the high human annotation cost and reliance on expert-labeled data and fully supervised models in network log anomaly detection, this paper proposes a low-supervision, automated detection framework. Methodologically, it pioneers the integration of active learning with large language models (LLMs): first, semantic embedding and clustering-based representative sampling identify high-informativeness log instances; then, LLMs perform few-shot labeling and label propagation to generate reliable pseudo-labels, enabling construction of an interpretable anomaly detector. Key contributions include: (1) a two-stage few-shot optimization strategy that enables automatic label expansion and root-cause interpretability; and (2) achieving detection accuracy comparable to fully supervised methods on real-world log datasets—improving F1-score by up to 3.2% while reducing annotation effort by over 90%. This work establishes a new paradigm for log analysis that is efficient, interpretable, and minimally dependent on manual annotation.
📝 Abstract
Network log data analysis plays a critical role in detecting security threats and operational anomalies. Traditional log analysis methods for anomaly detection and root cause analysis rely heavily on expert knowledge or fully supervised learning models, both of which require extensive labeled data and significant human effort. To address these challenges, we propose ALPHA, the first Active Learning Pipeline for Human-free log Analysis. ALPHA integrates semantic embedding, clustering-based representative sampling, and large language model (LLM)-assisted few-shot annotation to automate the anomaly detection process. The LLM annotated labels are propagated across clusters, enabling large-scale training of an anomaly detector with minimal supervision. To enhance the annotation accuracy, we propose a two-step few-shot refinement strategy that adaptively selects informative prompts based on the LLM's observed error patterns. Extensive experiments on real-world log datasets demonstrate that ALPHA achieves detection accuracy comparable to fully supervised methods while mitigating human efforts in the loop. ALPHA also supports interpretable analysis through LLM-driven root cause explanations in the post-detection stage. These capabilities make ALPHA a scalable and cost-efficient solution for truly automated log-based anomaly detection.