π€ AI Summary
To address semantic loss and difficulty in identifying ambiguous patterns in log-based anomaly detection for distributed systems, this paper proposes EnrichLogβa training-free framework. Its core innovation is a novel training-agnostic knowledge enrichment and fusion mechanism that jointly leverages corpus-level knowledge (global semantic retrieval) and sample-level knowledge (historical log exemplar augmentation), enabling context-aware and interpretable anomaly discrimination via retrieval-augmented generation (RAG). The method achieves a favorable balance among accuracy, robustness, and real-time deployment efficiency. Evaluated on four benchmark datasets, EnrichLog consistently outperforms five state-of-the-art baseline methods, achieving significant average improvements in F1-score. Moreover, it exhibits low inference latency and high prediction confidence, establishing an efficient and reliable unsupervised solution for production log analysis.
π Abstract
System logs are a critical resource for monitoring and managing distributed systems, providing insights into failures and anomalous behavior. Traditional log analysis techniques, including template-based and sequence-driven approaches, often lose important semantic information or struggle with ambiguous log patterns. To address this, we present EnrichLog, a training-free, entry-based anomaly detection framework that enriches raw log entries with both corpus-specific and sample-specific knowledge. EnrichLog incorporates contextual information, including historical examples and reasoning derived from the corpus, to enable more accurate and interpretable anomaly detection. The framework leverages retrieval-augmented generation to integrate relevant contextual knowledge without requiring retraining. We evaluate EnrichLog on four large-scale system log benchmark datasets and compare it against five baseline methods. Our results show that EnrichLog consistently improves anomaly detection performance, effectively handles ambiguous log entries, and maintains efficient inference. Furthermore, incorporating both corpus- and sample-specific knowledge enhances model confidence and detection accuracy, making EnrichLog well-suited for practical deployments.