Anchor PCA

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the challenge that traditional PCA struggles to learn a generalizable shared low-dimensional embedding from multi-domain data due to interference from domain-specific principal components. To overcome this limitation, the paper proposes Anchor PCA, which uniquely formulates the identification of shared directions of variation as its core optimization objective. By constructing a modified target matrix, Anchor PCA effectively balances total variance explained with consistency of embeddings across domains. The method efficiently recovers the maximal invariant subspace and achieves minimax-optimal reconstruction under bounded covariance perturbations. Empirical results demonstrate that Anchor PCA accurately recovers the invariant subspace on synthetic data and significantly outperforms both pooled PCA and worst-case approaches on real-world gas sensor time-series data with drift, markedly improving variance explained on unseen domains.

📝 Abstract

Principal component analysis (PCA) is one of the most widely used unsupervised dimension reduction techniques. We study PCA for data from multiple related domains. Since principal components generally differ across domains, one way to obtain a shared low-rank embedding is to perform PCA on the pooled data. However, this approach can focus on spurious directions that exhibit high variation in only a few domains. To find a robust embedding that still explains most variance in unseen but similar domains, we propose instead to focus on shared directions of variation. To this end, we introduce Anchor PCA which trades off overall explained variance with agreement between the shared and domain-specific low-rank embeddings. Anchor PCA amounts to PCA on a modified target matrix and thus can be solved efficiently. Moreover, we show that Anchor PCA recovers a maximal invariant subspace and admits a minimax reconstruction interpretation under bounded domain-specific covariance inflations. On simulated and real-world gas sensor data with temporal drift, we demonstrate, respectively, that Anchor PCA recovers the maximally invariant subspace and yields embeddings that explain more variance on unseen domains than the pooling baseline and a worst-case alternative. Taken together, these findings establish Anchor PCA as a promising approach to robust unsupervised dimension reduction from multi-domain data.

Problem

Research questions and friction points this paper is trying to address.

multi-domain data

robust dimension reduction

shared embedding

principal component analysis

domain generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Anchor PCA

multi-domain data

invariant subspace