Is Spurious Correlation Removal Always Learnable?

📅 2026-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the computational feasibility of invariant learning in multi-environment settings, demonstrating that even when statistically identifiable invariant structures exist, efficiently eliminating spurious correlations may remain intractable. By constructing a samplable family of multi-environment instances and integrating average-case complexity, the invariant risk minimization framework, and a local Gaussian regularity assumption, the authors introduce an environment diversity parameter γ to characterize identifiability and target curvature. They theoretically establish the existence of instances that are polynomially sample-learnable yet admit no efficient algorithm, derive a minimax risk bound of Θ(k(d−k)/(n|ℰ|)), and uncover a phase transition phenomenon governed by the interplay among sample size, number of environments, and ambient dimension, with a threshold scaling as n* ∝ k(d−k)/(|ℰ|γ²). Empirical experiments corroborate these theoretical predictions.
📝 Abstract
Invariant learning can fail even when the invariant structure is statistically identifiable. We show a conditional computational barrier: under a black-box samplable supervised sparse recovery primitive motivated by average-case sparse-recovery reductions, there exist \emph{samplable} multi-environment instances with a one-dimensional predictive invariant subspace ($k=1$) that are learnable with polynomial samples by exhaustive search, while any polynomial-time constant-accuracy recovery algorithm would contradict the primitive. We further quantify environment diversity by a separation parameter $γ$, which controls identifiability and the curvature of invariance objectives. Under sufficient diversity and local Gaussian regularity, the minimax risk is $\mathbb{E}[\dist(\hat{V},V_{\mathrm{inv}})^2]=Θ(k(d-k)/(n|\mathcal{E}|))$, and under label-induced shifts a phase transition occurs at $n^*\propto k(d-k)/(|\mathcal{E}|γ^2)$ with refined estimation error scaling proportional to $1/γ^2$. Synthetic and real datasets illustrate the predicted gaps and transitions and motivate simple diversity diagnostics.
Problem

Research questions and friction points this paper is trying to address.

spurious correlation
invariant learning
computational barrier
environment diversity
minimax risk
Innovation

Methods, ideas, or system contributions that make the work stand out.

invariant learning
spurious correlation
computational barrier
environment diversity
minimax risk
🔎 Similar Papers
Y
Yibo Zhou
Beijing Key Laboratory of Digital Media, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
Bo Li
Bo Li
Associate Professor of Beihang university
big data
H
Hai-Miao Hu
Beijing Key Laboratory of Digital Media, School of Computer Science and Engineering, Beihang University, Beijing 100191, China; State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China; Hangzhou Innovation Institute of Beihang University, Hangzhou 310051, China
Hanzi Wang
Hanzi Wang
Professor of Xiamen University
Computer VisionPattern RecognitionModel FittingVisual Tracking,Object Detection and Recognition
X
Xiaokang Zhang
Beijing Key Laboratory of Digital Media, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
R
Ruifan Zhang
Beijing Key Laboratory of Digital Media, School of Computer Science and Engineering, Beihang University, Beijing 100191, China