Log Anomaly Detection with Large Language Models via Knowledge-Enriched Fusion

πŸ“… 2025-12-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address semantic loss and difficulty in identifying ambiguous patterns in log-based anomaly detection for distributed systems, this paper proposes EnrichLogβ€”a training-free framework. Its core innovation is a novel training-agnostic knowledge enrichment and fusion mechanism that jointly leverages corpus-level knowledge (global semantic retrieval) and sample-level knowledge (historical log exemplar augmentation), enabling context-aware and interpretable anomaly discrimination via retrieval-augmented generation (RAG). The method achieves a favorable balance among accuracy, robustness, and real-time deployment efficiency. Evaluated on four benchmark datasets, EnrichLog consistently outperforms five state-of-the-art baseline methods, achieving significant average improvements in F1-score. Moreover, it exhibits low inference latency and high prediction confidence, establishing an efficient and reliable unsupervised solution for production log analysis.

Technology Category

Application Category

πŸ“ Abstract
System logs are a critical resource for monitoring and managing distributed systems, providing insights into failures and anomalous behavior. Traditional log analysis techniques, including template-based and sequence-driven approaches, often lose important semantic information or struggle with ambiguous log patterns. To address this, we present EnrichLog, a training-free, entry-based anomaly detection framework that enriches raw log entries with both corpus-specific and sample-specific knowledge. EnrichLog incorporates contextual information, including historical examples and reasoning derived from the corpus, to enable more accurate and interpretable anomaly detection. The framework leverages retrieval-augmented generation to integrate relevant contextual knowledge without requiring retraining. We evaluate EnrichLog on four large-scale system log benchmark datasets and compare it against five baseline methods. Our results show that EnrichLog consistently improves anomaly detection performance, effectively handles ambiguous log entries, and maintains efficient inference. Furthermore, incorporating both corpus- and sample-specific knowledge enhances model confidence and detection accuracy, making EnrichLog well-suited for practical deployments.
Problem

Research questions and friction points this paper is trying to address.

Detects anomalies in system logs using knowledge-enriched fusion.
Addresses semantic loss and ambiguous patterns in traditional methods.
Enhances detection accuracy and interpretability without retraining.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework enriches logs with contextual knowledge
Uses retrieval-augmented generation for knowledge integration without retraining
Incorporates both corpus-specific and sample-specific knowledge for accuracy
A
Anfeng Peng
University of Pittsburgh
A
Ajesh Koyatan Chathoth
Eaton Corporation
Stephen Lee
Stephen Lee
Assistant Professor, University of Pittsburgh
Distributed SystemsCyber-Physical SystemsIoTEnergy analytics