🤖 AI Summary
In multi-view subspace learning, theoretical guarantees for distinguishing shared and individual signal subspaces from high-dimensional noisy data remain lacking. This paper establishes, for the first time, necessary and sufficient conditions for subspace separability and develops a rigorous theoretical framework based on spectral perturbation analysis of projection matrix products. Integrating rotational bootstrap with random matrix theory, we propose a parameter-free, interpretable method that automatically partitions subspaces into three categories—shared, individual, and noise—without manual tuning. Leveraging principal angle analysis and diagnostic visualization, our approach enhances estimation robustness. Extensive simulations demonstrate substantial improvements in estimation accuracy for both joint and individual subspaces over state-of-the-art methods. On real-world multi-omics colorectal cancer and murine nutritional genomics datasets, downstream classification and prediction performance is significantly enhanced.
📝 Abstract
Multi-view data provides complementary information on the same set of observations, with multi-omics and multimodal sensor data being common examples. Analyzing such data typically requires distinguishing between shared (joint) and unique (individual) signal subspaces from noisy, high-dimensional measurements. Despite many proposed methods, the conditions for reliably identifying joint and individual subspaces remain unclear. We rigorously quantify these conditions, which depend on the ratio of the signal rank to the ambient dimension, principal angles between true subspaces, and noise levels. Our approach characterizes how spectrum perturbations of the product of projection matrices, derived from each view's estimated subspaces, affect subspace separation. Using these insights, we provide an easy-to-use and scalable estimation algorithm. In particular, we employ rotational bootstrap and random matrix theory to partition the observed spectrum into joint, individual, and noise subspaces. Diagnostic plots visualize this partitioning, providing practical and interpretable insights into the estimation performance. In simulations, our method estimates joint and individual subspaces more accurately than existing approaches. Applications to multi-omics data from colorectal cancer patients and nutrigenomic study of mice demonstrate improved performance in downstream predictive tasks.