🤖 AI Summary
This work addresses the limited interpretability of traditional principal component analysis (PCA) in high-dimensional settings, where dense loading vectors obscure feature relevance, and overcomes the reliance of existing sparse PCA methods on ℓ1 regularization, which requires difficult hyperparameter tuning. The authors propose Adversarial PCA (AdvPCA), which formulates sparsity as a robust optimization problem under bounded worst-case perturbations in the latent space. AdvPCA alternately updates a sparse encoder and an orthogonal decoder, yielding closed-form solutions and incorporating an adaptive parametrization strategy that eliminates the need for fine-tuned regularization parameters. Experiments on both synthetic and real genomic datasets demonstrate that AdvPCA consistently outperforms current approaches in terms of sparsity, reconstruction accuracy, and numerical stability, offering a plug-and-play solution for interpretable dimensionality reduction.
📝 Abstract
While principal component analysis (PCA) is a fundamental tool for dimensionality reduction, its dense representations make it ill-suited for high-dimensional data. Existing methods address this by promoting sparsity through explicit $\ell_1$-penalties, but these are not obvious to tune due to the unsupervised nature of the task. In contrast, we propose Adversarial PCA (AdvPCA), which leverages robust optimization to achieve sparsity by optimizing the reconstruction objective against bounded, worst-case latent space perturbations. We show that this formulation admits a closed-form reduction, leading to a practical iterative algorithm that alternates between adversarial linear regression-style updates for the sparse encoder and orthogonal updates for the decoder. By theoretically characterizing the solution, we derive a data-adaptive parameterization that allows the algorithm to perform effectively out of the box. We validate these claims through numerical experiments on synthetic and real-world genomics data.