🤖 AI Summary
This study investigates whether natural experiments—implicitly embedded in real-world datasets—can be leveraged to enhance model performance by treating them as interventional data. The authors propose a framework that reconstructs the underlying causal graph via causal discovery and integrates causal feature selection to systematically compare modeling data as interventional versus purely observational. For the first time, they demonstrate the widespread presence of such natural experiments across a large collection of real-world datasets and show that identifying and exploiting these implicit interventions through causal inference methods leads to significant improvements in downstream task performance. This work establishes a novel paradigm for extracting interventional signals from observational data, offering a principled approach to boost predictive accuracy without requiring explicit experimental setups.
📝 Abstract
In nature, events that affect some individuals or groups but not others constitute an implicit intervention and are known as natural experiments. For example, the COVID-19 pandemic was an intervention by the coronavirus on the sub-population infected with COVID. We ask, do natural experiments occur in existing real-world datasets? If yes, how should we treat them? To detect natural experiments in data, we use causal discovery to recover the underlying causal graph and perform feature selection based on causal links. If downstream performance improves by treating the data as interventional rather than observational, we argue that this suggests the dataset contains natural experiments. We first validate this hypothesis by simulating datasets with and without natural experiments using synthetic graphs. We then perform a systematic empirical evaluation on a large suite of real-world datasets. Our results indicate that real-world datasets do contain natural experiments and we can take advantage of those natural experiments to improve model performance using causal inference. Our work represents the initial foray into this area, offering a preliminary exploration within a limited scope.