Design-based edge-level causal inference with machine learning assisted covariate adjustment

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

This study addresses the challenge of estimating edge-level causal effects in directed networks under binary interference, where complex dependency structures render conventional node-level methods inadequate. The authors propose a design-based Horvitz–Thompson estimator augmented with an innovative three-fold sample splitting and cross-fitting procedure tailored to edge data, effectively mitigating dependencies induced by shared units. By integrating machine learning for covariate adjustment, the method substantially improves estimation efficiency while preserving unbiasedness. Theoretical analysis establishes the estimator’s asymptotic normality and demonstrates that its variance estimate is tighter than those from classical approaches. Both simulation studies and empirical analyses confirm the proposed method’s superior precision relative to unadjusted alternatives.

📝 Abstract

We study design-based causal inference for edge-level outcomes in directed networks under dyadic interference. In this setting, outcomes are defined on directed edges and depend on the joint treatment assignments of pairs of units, inducing a complex dependence structure that invalidates standard estimation and inference procedures developed for node-level data. We construct Horvitz--Thompson estimators for a general class of edge-level causal effects and establish their asymptotic normality under mild regularity conditions. To enable valid inference, we develop variance estimators that exploit identifiable components of network dependence, yielding substantially less conservative bounds than classical approaches. To improve efficiency, we incorporate auxiliary covariates through a sample splitting and cross-fitting procedure. A key technical challenge is that standard two-fold sample splitting fails in the presence of edge-level outcomes due to the dependence induced by shared units. To address this issue, we introduce a three-fold sample splitting and cross-fitting scheme that restores the conditional independence required for unbiased estimation. Under a stability condition, the resulting covariate-adjusted estimator is asymptotically normal and accommodates both linear adjustment and flexible machine learning methods. We further introduce a calibration step that guarantees no asymptotic efficiency loss relative to the unadjusted estimator. Simulation studies and a real-data application confirm the theoretical results and demonstrate substantial efficiency gains.

Problem

Research questions and friction points this paper is trying to address.

edge-level causal inference

dyadic interference

network dependence

design-based inference

causal effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

edge-level causal inference

dyadic interference

three-fold sample splitting