DoubleGen: Debiased Generative Modeling of Counterfactuals

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Counterfactual generation models are commonly afflicted by confounding bias—arising from unaccounted systematic differences between treatment and control groups—and misspecification bias due to inaccurate auxiliary models. This paper introduces DoubleGen, the first framework to integrate double robustness into generative modeling: it theoretically guarantees confounding bias elimination provided either the propensity score model or the outcome model is correctly specified. By reformulating training objectives for diffusion models, flow matching, and autoregressive language models, DoubleGen enables debiased counterfactual generation. The method offers finite-sample theoretical guarantees, achieving both oracle-optimal and minimax-optimal convergence rates. Empirical evaluations across diverse generative tasks demonstrate that DoubleGen significantly reduces estimation bias and enhances the robustness and accuracy of causal counterfactual predictions.

Technology Category

Application Category

📝 Abstract

Generative models for counterfactual outcomes face two key sources of bias. Confounding bias arises when approaches fail to account for systematic differences between those who receive the intervention and those who do not. Misspecification bias arises when methods attempt to address confounding through estimation of an auxiliary model, but specify it incorrectly. We introduce DoubleGen, a doubly robust framework that modifies generative modeling training objectives to mitigate these biases. The new objectives rely on two auxiliaries -- a propensity and outcome model -- and successfully address confounding bias even if only one of them is correct. We provide finite-sample guarantees for this robustness property. We further establish conditions under which DoubleGen achieves oracle optimality -- matching the convergence rates standard approaches would enjoy if interventional data were available -- and minimax rate optimality. We illustrate DoubleGen with three examples: diffusion models, flow matching, and autoregressive language models.

Problem

Research questions and friction points this paper is trying to address.

Addresses confounding bias from systematic intervention group differences

Mitigates misspecification bias from incorrect auxiliary model estimation

Provides debiased generative modeling of counterfactual outcomes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Doubly robust framework modifies generative training objectives

Uses propensity and outcome models to mitigate bias

Achieves oracle optimality under specified conditions

🔎 Similar Papers

Benchmarking Counterfactual Image Generation