Optimization without Retraction on the Random Generalized Stiefel Manifold

📅 2024-05-02
🏛️ International Conference on Machine Learning
📈 Citations: 5
Influential: 2
📄 PDF
🤖 AI Summary
This work addresses stochastic optimization on the generalized Stiefel manifold—defined as the set of matrices $X$ satisfying $X^ op B X = I_p$—which arises in canonical correlation analysis (CCA), independent component analysis (ICA), and generalized eigenvalue problems (GEVP). Conventional Riemannian approaches require exact knowledge of $B$ and enforce the constraint via costly retraction or projection at each iteration. We propose the first projection-free stochastic algorithm: it operates using only unbiased stochastic estimates of $B$, employs a constrained-stable stochastic gradient update dominated by matrix multiplications, and converges almost surely to constrained critical points. Theoretically, its convergence rate matches that of full-information Riemannian methods under standard assumptions. Empirically, each iteration incurs significantly lower computational cost while achieving comparable accuracy.

Technology Category

Application Category

📝 Abstract
Optimization over the set of matrices $X$ that satisfy $X^ op B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require a fully formed $B$. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to random estimates of $B$. Our method does not enforce the constraint in every iteration; instead, it produces iterations that converge to critical points on the generalized Stiefel manifold defined in expectation. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian optimization counterparts that require the full matrix $B$. Experiments demonstrate its effectiveness in various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and the GEVP.
Problem

Research questions and friction points this paper is trying to address.

Optimizing over matrices with generalized orthogonality constraints in covariance applications
Solving optimization without full matrix B using stochastic iterative methods
Achieving convergence on generalized Stiefel manifold with lower per-iteration cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic iterative method using random B estimates
Converges to critical points without enforcing constraints
Lower per-iteration cost with matrix multiplications only
🔎 Similar Papers
No similar papers found.