🤖 AI Summary
Causal discovery for high-dimensional sparse event sequences—common in healthcare, cybersecurity, and vehicle diagnostics—with hundreds to thousands of event types, remains challenging for one-shot multilabel inference.
Method: This paper proposes OSCAR, a novel method that jointly integrates autoregressive modeling with Markov boundary estimation via a dual-pretrained Transformer architecture for parallelizable joint probability density estimation. OSCAR bypasses computationally prohibitive global conditional independence (CI) tests by directly inferring each event’s Markov boundary from a single sequence using information-theoretic criteria.
Contribution/Results: On a real-world automotive dataset (29,100 events, 474 labels), OSCAR generates interpretable causal graphs within minutes; in contrast, classical CI-based methods fail due to exponential computational complexity. OSCAR significantly improves scalability and practicality for high-dimensional sparse settings, enabling efficient, end-to-end causal structure learning without explicit CI testing.
📝 Abstract
Understanding causality in event sequences with thousands of sparse event types is critical in domains such as healthcare, cybersecurity, or vehicle diagnostics, yet current methods fail to scale. We present OSCAR, a one-shot causal autoregressive method that infers per-sequence Markov Boundaries using two pretrained Transformers as density estimators. This enables efficient, parallel causal discovery without costly global CI testing. On a real-world automotive dataset with 29,100 events and 474 labels, OSCAR recovers interpretable causal structures in minutes, while classical methods fail to scale, enabling practical scientific diagnostics at production scale.