🤖 AI Summary
This study addresses the limited causal interpretability in large-scale multimodal electronic health records (EHRs). We propose a generalizable causal machine learning framework: first, latent variable decomposition is performed via probabilistic independent component analysis to isolate underlying causal sources; second, multimodal clinical features are integrated to construct a task-specific causal inference model enabling individualized causal effect estimation. Our key methodological advance lies in jointly modeling latent variable discovery and causal structure learning—thereby mitigating challenges inherent to EHRs, including high noise, incompleteness, and strong confounding. Evaluated on two real-world clinical tasks—sepsis prognosis prediction and ICU medication response analysis—the framework successfully identifies medically interpretable latent causal factors and significantly improves both accuracy and robustness of individual causal effect estimation. This work establishes a novel paradigm for data-driven causal discovery in medicine.
📝 Abstract
We provide an accessible description of a peer-reviewed generalizable causal machine learning pipeline to (i) discover latent causal sources of large-scale electronic health records observations, and (ii) quantify the source causal effects on clinical outcomes. We illustrate how imperfect multimodal clinical data can be processed, decomposed into probabilistic independent latent sources, and used to train taskspecific causal models from which individual causal effects can be estimated. We summarize the findings of the two real-world applications of the approach to date as a demonstration of its versatility and utility for medical discovery at scale.