Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses causal representation learning: extracting a $d$-dimensional invariant representation $W = f(X)$ from high-dimensional input $X$, such that $W$ predicts the response variable $Y$ while remaining invariant to confounding variables $Z$. Methodologically, it innovatively reformulates the conditional independence constraint $I(Y; W mid Z) = 0$ as an unconditional independence constraint $I(Y; W perp Z_Y)$, where $Z_Y$ is the optimal transport barycenter of $Z$ conditioned on $Y$; under Gaussian assumptions, this equivalence is rigorously established. When $Z$ is unobserved, the framework seamlessly substitutes it with a proxy variable $S$. Theoretically, the optimal linear feature extractor admits a closed-form solution: its rows span the subspace of the top-$d$ eigenvectors of an explicitly characterized covariance matrix. The framework ensures statistical rigor and computational efficiency, and naturally extends to non-Gaussian and nonlinear settings.

Technology Category

Application Category

📝 Abstract

A methodology is developed to extract $d$ invariant features $W=f(X)$ that predict a response variable $Y$ without being confounded by variables $Z$ that may influence both $X$ and $Y$. The methodology's main ingredient is the penalization of any statistical dependence between $W$ and $Z$ conditioned on $Y$, replaced by the more readily implementable plain independence between $W$ and the random variable $Z_Y = T(Z,Y)$ that solves the [Monge] Optimal Transport Barycenter Problem for $Zmid Y$. In the Gaussian case considered in this article, the two statements are equivalent. When the true confounders $Z$ are unknown, other measurable contextual variables $S$ can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix $Σ_{ZS}$ has full range. The resulting linear feature extractor adopts a closed form in terms of the first $d$ eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.

Problem

Research questions and friction points this paper is trying to address.

Extract invariant features predicting Y unaffected by confounders Z

Use optimal transport barycenter to enforce independence between features and Z

Provide closed-form linear solution for Gaussian cases with unknown confounders

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional independence penalization for invariant feature extraction

Optimal transport barycenter problem for confounder disentanglement

Closed-form linear extractor using eigenvectors of known matrix

🔎 Similar Papers

No similar papers found.

Authors to Follow