🤖 AI Summary
Existing root cause analysis methods often rely on known causal graphs or strong assumptions, limiting their ability to accurately identify structural anomalies in complex systems. This work proposes a local mechanism-level root cause analysis framework that operates without requiring a global causal graph. Grounded in the principle of independent causal mechanisms, the approach estimates the local Markov boundary of each variable and detects conditional distribution shifts within these boundaries to pinpoint intervention targets with high probability. By circumventing the need for global graph structure learning, the method demonstrates robustness to graph misspecification, effectively identifies multiple intervention targets, scales well to large systems, and exhibits reliable performance across domains, as validated on synthetic data and five real-world datasets.
📝 Abstract
Root-Cause Analysis (RCA) seeks to identify the variables responsible for abnormal system behavior in complex domains such as manufacturing, cloud computing, and healthcare. Existing approaches face a critical bottleneck: graph-based causal methods can identify intervention targets but typically require a known or accurately estimated causal graph, while graph-free statistical methods either localize marginal anomalies rather than structural causes, or rely on restrictive assumptions about graph structure or functional form. We propose StableRCA, a local mechanism-level RCA framework that avoids global graph discovery by estimating local Markov boundaries and detecting conditional distribution shifts within them. Leveraging the Independent Causal Mechanism principle, we show that intervention targets can be identified with probability converging exponentially in sample size under faithful Markov boundary recovery and non-degenerate mechanism shifts. Experiments on synthetic benchmarks and five real-world datasets demonstrate that StableRCA is robust to graph misspecification, effective under multiple intervention targets, scalable to large systems, and reliable across diverse application domains. Code is available at: https://anonymous.4open.science/r/StableRCA-E362