Graph Canonical Correlation Analysis

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional canonical correlation analysis (CCA) fails to capture inherent structured dependencies among cross-group variables, leading to biased association estimates. To address this, we propose graph-structured CCA (gCCA), the first CCA framework that explicitly incorporates variable topology—encoded as a prior graph—via graph-regularized constraints while preserving mult-omics association modeling capability. Theoretically, we establish finite-sample concentration inequalities and stopping-time convergence guarantees using martingale theory. Algorithmically, gCCA enables interpretable identification of both positive and negative regulatory pathways. Experiments demonstrate that gCCA significantly outperforms state-of-the-art CCA methods on synthetic data and successfully disentangles DNA methylation–RNA-seq associations, revealing bidirectional regulatory mechanisms whereby methylation modulates gene expression pathways.

Technology Category

Application Category

📝 Abstract
Canonical correlation analysis (CCA) is a widely used technique for estimating associations between two sets of multi-dimensional variables. Recent advancements in CCA methods have expanded their application to decipher the interactions of multiomics datasets, imaging-omics datasets, and more. However, conventional CCA methods are limited in their ability to incorporate structured patterns in the cross-correlation matrix, potentially leading to suboptimal estimations. To address this limitation, we propose the graph Canonical Correlation Analysis (gCCA) approach, which calculates canonical correlations based on the graph structure of the cross-correlation matrix between the two sets of variables. We develop computationally efficient algorithms for gCCA, and provide theoretical results for finite sample analysis of best subset selection and canonical correlation estimation by introducing concentration inequalities and stopping time rule based on martingale theories. Extensive simulations demonstrate that gCCA outperforms competing CCA methods. Additionally, we apply gCCA to a multiomics dataset of DNA methylation and RNA-seq transcriptomics, identifying both positively and negatively regulated gene expression pathways by DNA methylation pathways.
Problem

Research questions and friction points this paper is trying to address.

Enhance CCA with graph structures
Improve multiomics dataset analysis
Develop efficient gCCA algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based Canonical Correlation Analysis
Efficient algorithms for gCCA
Finite sample analysis techniques
🔎 Similar Papers
No similar papers found.
Hongju Park
Hongju Park
Postdoc, University of Maryland
Statistical machine learningReinforcement learningBayesian statistics
Zhenyao Ye
Zhenyao Ye
University of Maryland, Baltimore
BioinformaticsBiostatisticsHuman Genetics
H
Hwiyoung Lee
Maryland Psychiatric Research Center, School of Medicine, University of Maryland; The University of Maryland Institute for Health Computing (UM-IHC)
Tianzhou Ma
Tianzhou Ma
Associate Professor of Biostatistics, University of Maryland
S
Shuo Chen
Maryland Psychiatric Research Center, School of Medicine, University of Maryland; The University of Maryland Institute for Health Computing (UM-IHC)