Graph Canonical Correlation Analysis

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional canonical correlation analysis (CCA) fails to capture inherent structured dependencies among cross-group variables, leading to biased association estimates. To address this, we propose graph-structured CCA (gCCA), the first CCA framework that explicitly incorporates variable topology—encoded as a prior graph—via graph-regularized constraints while preserving mult-omics association modeling capability. Theoretically, we establish finite-sample concentration inequalities and stopping-time convergence guarantees using martingale theory. Algorithmically, gCCA enables interpretable identification of both positive and negative regulatory pathways. Experiments demonstrate that gCCA significantly outperforms state-of-the-art CCA methods on synthetic data and successfully disentangles DNA methylation–RNA-seq associations, revealing bidirectional regulatory mechanisms whereby methylation modulates gene expression pathways.

Technology Category

Application Category

📝 Abstract

Canonical correlation analysis (CCA) is a widely used technique for estimating associations between two sets of multi-dimensional variables. Recent advancements in CCA methods have expanded their application to decipher the interactions of multiomics datasets, imaging-omics datasets, and more. However, conventional CCA methods are limited in their ability to incorporate structured patterns in the cross-correlation matrix, potentially leading to suboptimal estimations. To address this limitation, we propose the graph Canonical Correlation Analysis (gCCA) approach, which calculates canonical correlations based on the graph structure of the cross-correlation matrix between the two sets of variables. We develop computationally efficient algorithms for gCCA, and provide theoretical results for finite sample analysis of best subset selection and canonical correlation estimation by introducing concentration inequalities and stopping time rule based on martingale theories. Extensive simulations demonstrate that gCCA outperforms competing CCA methods. Additionally, we apply gCCA to a multiomics dataset of DNA methylation and RNA-seq transcriptomics, identifying both positively and negatively regulated gene expression pathways by DNA methylation pathways.

Problem

Research questions and friction points this paper is trying to address.

Enhance CCA with graph structures

Improve multiomics dataset analysis

Develop efficient gCCA algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based Canonical Correlation Analysis

Efficient algorithms for gCCA

Finite sample analysis techniques

🔎 Similar Papers

No similar papers found.

Authors to Follow