Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets

📅 2023-09-21
🏛️ arXiv.org
📈 Citations: 7
Influential: 0
📄 PDF
🤖 AI Summary
Causal discovery algorithms suffer from poor robustness under small-sample regimes and latent confounding, while lacking uncertainty quantification and failing to integrate domain expertise with interpretability. This paper proposes a human-in-the-loop ancestral graph learning framework. We introduce GFlowNet for probabilistic ancestral graph sampling—novel in causal discovery—and combine Bayesian scoring (e.g., BIC) with importance reweighting to enable robust inference under latent confounding. A closed-loop interactive mechanism allows domain experts to validate high-uncertainty structures, while optimal experimental design triggers discriminative queries to resolve structural ambiguity. On synthetic benchmarks, our method significantly improves structural identification accuracy, accommodates non-causal sufficiency (i.e., permits latent confounders), and yields a sampling distribution that closely approximates the true posterior belief over ancestral graphs.
📝 Abstract
Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inference process. Surprisingly, while CD is a human-centered affair, no works have focused on building methods that both 1) output uncertainty estimates that can be verified by experts and 2) interact with those experts to iteratively refine CD. To solve these issues, we start by proposing to sample (causal) ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC), using generative flow networks. Then, we leverage the diversity in candidate graphs and introduce an optimal experimental design to iteratively probe the expert about the relations among variables, effectively reducing the uncertainty of our belief over ancestral graphs. Finally, we update our samples to incorporate human feedback via importance sampling. Importantly, our method does not require causal sufficiency (i.e., unobserved confounders may exist). Experiments with synthetic observational data show that our method can accurately sample from distributions over ancestral graphs and that we can greatly improve inference quality with human aid.
Problem

Research questions and friction points this paper is trying to address.

Addresses brittleness of causal discovery with scarce data and latent confounders
Quantifies epistemic uncertainty in causal relationships via ancestral graphs
Incorporates expert feedback through optimal experimental design framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sampling ancestral graphs using score-based belief distribution
Expert elicitation framework with optimal experimental design
Incorporating human or LLM feedback to refine causal inference
🔎 Similar Papers
No similar papers found.