Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of causal discovery in distributed data settings characterized by privacy constraints, heterogeneous variables, and latent confounding, where data cannot be centralized and statistical power is limited. The authors propose fedCI, the first federated causal discovery framework capable of handling non-aligned variables, mixed data types, and latent confounders. By federating the Integrated Overlapping Data (IOD) algorithm and integrating a federated iterative reweighted least squares (IRLS)-based generalized linear model for parameter estimation with likelihood ratio tests, fedCI enables efficient federated conditional independence testing. Experimental results demonstrate that fedCI achieves performance comparable to centralized methods while strictly preserving privacy, substantially mitigates small-sample bias, and is accompanied by open-source implementations in Python and R, along with a web-based tool.

Technology Category

Application Category

📝 Abstract
Causal discovery across multiple datasets is often constrained by data privacy regulations and cross-site heterogeneity, limiting the use of conventional methods that require a single, centralized dataset. To address these challenges, we introduce fedCI, a federated conditional independence test that rigorously handles heterogeneous datasets with non-identical sets of variables, site-specific effects, and mixed variable types, including continuous, ordinal, binary, and categorical variables. At its core, fedCI uses a federated Iteratively Reweighted Least Squares (IRLS) procedure to estimate the parameters of generalized linear models underlying likelihood-ratio tests for conditional independence. Building on this, we develop fedCI-IOD, a federated extension of the Integration of Overlapping Datasets (IOD) algorithm, that replaces its meta-analysis strategy and enables, for the fist time, federated causal discovery under latent confounding across distributed and heterogeneous datasets. By aggregating evidence federatively, fedCI-IOD not only preserves privacy but also substantially enhances statistical power, achieving performance comparable to fully pooled analyses and mitigating artifacts from low local sample sizes. Our tools are publicly available as the fedCI Python package, a privacy-preserving R implementation of IOD, and a web application for the fedCI-IOD pipeline, providing versatile, user-friendly solutions for federated conditional independence testing and causal discovery.
Problem

Research questions and friction points this paper is trying to address.

Federated Causal Discovery
Latent Confounding
Heterogeneous Datasets
Data Privacy
Conditional Independence
Innovation

Methods, ideas, or system contributions that make the work stand out.

federated causal discovery
conditional independence test
latent confounding
heterogeneous datasets
generalized linear models
🔎 Similar Papers
No similar papers found.
M
Maximilian Hahn
University of Münster, Institute of Medical Informatics, Münster, Germany
A
Alina Zajak
University of Münster, Institute of Medical Informatics, Münster, Germany
Dominik Heider
Dominik Heider
Director, University of Münster
Data ScienceMachine LearningArtificial IntelligenceBiomedical InformaticsSaMD
A
Adèle Helena Ribeiro
University of Münster, Institute of Medical Informatics, Münster, Germany