Physical Simulators as Do-Operators: Causal Discovery under Latent Confounders for AI-for-Science

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
This work addresses the challenges of causal discovery in AI for Science, where latent confounding and the high cost of interventions often hinder reliable inference. The authors propose an innovative approach that leverages first-principles physics simulators as do-operators, integrating them with Causal Flow Matching (CFM) and limited real-world interventional data to enable efficient causal structure learning even when the causal sufficiency assumption is violated. Theoretically, they prove that only O(d) single-variable interventions are sufficient to identify the causal graph over d variables. Empirically, the method achieves an F1 score of 0.800 on synthetic benchmarks, substantially outperforming baseline methods, and demonstrates practical efficacy by reducing estimation bias by 57%–58% in real-world applications involving molecular toxicity prediction and battery electrolyte optimization.
📝 Abstract
Existing interventional causal discovery methods -- IGSP, DCDI, ENCO -- assume causal sufficiency (no latent confounders) and rely on virtual interventions in synthetic simulators. In AI-for-Science settings such as molecular design and materials science, latent confounders are ubiquitous and real interventions (e.g., physics-based simulations) require hours to days per data point. We propose CFM-SD (Causal Flow Matching with Simulation Data), which uses first-principles physical simulators as do-operators in Pearl's interventional calculus to simultaneously handle latent confounders and real interventional data. Theoretically, $d$-variable causal structure is identifiable with $O(d)$ single-variable interventions -- the minimum under physical realizability constraints. In Intrinsic Evaluation on synthetic data ($γ=0.2$--$0.8$), CFM-SD achieves average F1$=0.800$ vs. F1$=0.127$--$0.562$ for all baselines. In Extrinsic Evaluation on real scientific data, CFM-SD achieves 57--58\% bias reduction in molecular toxicity prediction and battery electrolyte optimization, demonstrating practical value beyond synthetic benchmarks.
Problem

Research questions and friction points this paper is trying to address.

causal discovery
latent confounders
physical simulators
interventional data
AI-for-Science
Innovation

Methods, ideas, or system contributions that make the work stand out.

causal discovery
latent confounders
physical simulators
do-operator
interventional data
🔎 Similar Papers
No similar papers found.