DynaCausal: Dynamic Causality-Aware Root Cause Analysis for Distributed Microservices

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In cloud-native microservices, dynamic dependency evolution and cascading failure propagation severely degrade root cause analysis (RCA) accuracy and robustness; existing methods struggle with concept drift, observational noise, and service-level biases that obscure true root causes. To address these challenges, we propose a Dynamic Causal-Aware RCA framework: (1) modeling time-varying spatiotemporal dependencies via interaction-aware representation learning and multimodal dynamic signal fusion; (2) employing a dynamic contrastive mechanism to disentangle failure signals from contextual noise; and (3) introducing a causal-prioritized pairwise ranking objective to enhance interpretability of root cause identification. Evaluated on public benchmarks, our method achieves an Accuracy@1 of 0.63—outperforming state-of-the-art approaches by an absolute margin of 0.25–0.46—demonstrating substantial improvements in both fault localization precision and robustness against evolving system dynamics.

Technology Category

Application Category

📝 Abstract
Cloud-native microservices enable rapid iteration and scalable deployment but also create complex, fast-evolving dependencies that challenge reliable diagnosis. Existing root cause analysis (RCA) approaches, even with multi-modal fusion of logs, traces, and metrics, remain limited in capturing dynamic behaviors and shifting service relationships. Three critical challenges persist: (i) inadequate modeling of cascading fault propagation, (ii) vulnerability to noise interference and concept drift in normal service behavior, and (iii) over-reliance on service deviation intensity that obscures true root causes. To address these challenges, we propose DynaCausal, a dynamic causality-aware framework for RCA in distributed microservice systems. DynaCausal unifies multi-modal dynamic signals to capture time-varying spatio-temporal dependencies through interaction-aware representation learning. It further introduces a dynamic contrastive mechanism to disentangle true fault indicators from contextual noise and adopts a causal-prioritized pairwise ranking objective to explicitly optimize causal attribution. Comprehensive evaluations on public benchmarks demonstrate that DynaCausal consistently surpasses state-of-the-art methods, attaining an average AC@1 of 0.63 with absolute gains from 0.25 to 0.46, and delivering both accurate and interpretable diagnoses in highly dynamic microservice environments.
Problem

Research questions and friction points this paper is trying to address.

Capturing dynamic fault propagation in microservice dependencies
Distinguishing true root causes from noise and concept drift
Overcoming reliance on service deviation intensity for diagnosis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies multi-modal signals for dynamic dependencies
Introduces contrastive mechanism to reduce noise interference
Adopts causal-prioritized ranking for accurate attribution
🔎 Similar Papers
No similar papers found.
S
Songhan Zhang
The Chinese University of Hong Kong, Shenzhen
Aoyang Fang
Aoyang Fang
The Chinese University of Hong Kong, Shenzhen
Software testingAIopsRoot cause analysis
Y
Yifan Yang
The Chinese University of Hong Kong, Shenzhen
R
Ruiyi Cheng
The Chinese University of Hong Kong, Shenzhen
X
Xiaoying Tang
The Chinese University of Hong Kong, Shenzhen
Pinjia He
Pinjia He
Assistant Professor, The Chinese University of Hong Kong, Shenzhen
Software EngineeringAI4SESE4AIAIOps