π€ AI Summary
This paper addresses causal representation learning: extracting a $d$-dimensional invariant representation $W = f(X)$ from high-dimensional input $X$, such that $W$ predicts the response variable $Y$ while remaining invariant to confounding variables $Z$. Methodologically, it innovatively reformulates the conditional independence constraint $I(Y; W mid Z) = 0$ as an unconditional independence constraint $I(Y; W perp Z_Y)$, where $Z_Y$ is the optimal transport barycenter of $Z$ conditioned on $Y$; under Gaussian assumptions, this equivalence is rigorously established. When $Z$ is unobserved, the framework seamlessly substitutes it with a proxy variable $S$. Theoretically, the optimal linear feature extractor admits a closed-form solution: its rows span the subspace of the top-$d$ eigenvectors of an explicitly characterized covariance matrix. The framework ensures statistical rigor and computational efficiency, and naturally extends to non-Gaussian and nonlinear settings.
π Abstract
A methodology is developed to extract $d$ invariant features $W=f(X)$ that predict a response variable $Y$ without being confounded by variables $Z$ that may influence both $X$ and $Y$.
The methodology's main ingredient is the penalization of any statistical dependence between $W$ and $Z$ conditioned on $Y$, replaced by the more readily implementable plain independence between $W$ and the random variable $Z_Y = T(Z,Y)$ that solves the [Monge] Optimal Transport Barycenter Problem for $Zmid Y$. In the Gaussian case considered in this article, the two statements are equivalent.
When the true confounders $Z$ are unknown, other measurable contextual variables $S$ can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix $Ξ£_{ZS}$ has full range. The resulting linear feature extractor adopts a closed form in terms of the first $d$ eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.