🤖 AI Summary
Network failures propagate across layers under the joint influence of topology and protocol dependencies, often producing highly similar end-node alarms from distinct root causes, which complicates accurate diagnosis. To address this challenge, this work proposes PropLLM, a novel diagnostic framework that integrates hop-by-hop causal backtracking with large language model (LLM) reasoning. PropLLM leverages a dual-layer knowledge graph to retrieve verifiable evidence and introduces a Temporal Causal Propagation Attention (TCPA) mechanism that embeds topological causal priors into attention computation, guiding the model to perform backward inference along correct propagation paths and reconstruct complete causal chains for precise root cause localization and fault type identification. Evaluated on a real-world Wi-Fi multimodal dataset, PropLLM improves accuracy by 4.7% in root cause localization and 3.9% in fault type diagnosis while reducing hallucination rate by 50.8%. Further experiments on the TeleLogs 5G dataset confirm its strong generalization capability.
📝 Abstract
Network faults propagate layer by layer along topology and protocol dependencies, yet operations systems typically observe only symptomatic alerts at the tail end of propagation chains, where distinct root-cause faults may produce highly similar end-point symptoms. Existing approaches, whether rule-based, machine learning (ML)-based, or large language model (LLM)-based, fundamentally map the alert set to a diagnosis in a single pass and are structurally incapable of resolving this end-point ambiguity. This paper proposes PropLLM, which is the first to integrate the hop-by-hop scene reconstruction paradigm with the generative reasoning capabilities of LLMs. Starting from end-point alerts, PropLLM traces back hop-by-hop along the propagation path, retrieving verifiable factual evidence from a dual-layer knowledge graph (KG) at each hop, while the proposed Temporal Causal Propagation Attention (TCPA) mechanism encodes known topological causal priors directly into the attention computation to guide the model along the correct causal direction, ultimately localizing the root cause and determining the fault type through a fully evidenced causal chain. On a real-world Wi-Fi multimodal fault dataset, PropLLM improves fault type diagnosis accuracy by 3.9\% and root cause localization accuracy by 4.7\% over the strongest baseline, while reducing the hallucination rate by 50.8\%. Supplementary experiments on the TeleLogs 5G dataset further demonstrate the effectiveness of the proposed method across different network scenarios.