InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-dimensional bimodal time-series data (e.g., fMRI) pose challenges in jointly modeling shared and modality-specific latent structures while preserving discriminative and interpretable representations. Method: We propose Mutual Information-driven Dynamic Probabilistic Canonical Correlation Analysis (MI-DPCCA), a novel framework that jointly learns a shared latent representation—encoding only the mutual information between two sequences—and modality-specific latent components. For the first time, we integrate the information bottleneck principle into the dynamic probabilistic CCA framework, enforcing information constraints in the shared space via variational inference. Additionally, we introduce a two-stage training strategy and residual connections to enhance optimization stability and generative fidelity. Results: Evaluated on synthetic benchmarks and real fMRI datasets, MI-DPCCA significantly improves representation discriminability, interpretability, and downstream prediction performance, consistently outperforming DPCCA and other baselines across all metrics.

Technology Category

Application Category

📝 Abstract
Extracting meaningful latent representations from high-dimensional sequential data is a crucial challenge in machine learning, with applications spanning natural science and engineering. We introduce InfoDPCCA, a dynamic probabilistic Canonical Correlation Analysis (CCA) framework designed to model two interdependent sequences of observations. InfoDPCCA leverages a novel information-theoretic objective to extract a shared latent representation that captures the mutual structure between the data streams and balances representation compression and predictive sufficiency while also learning separate latent components that encode information specific to each sequence. Unlike prior dynamic CCA models, such as DPCCA, our approach explicitly enforces the shared latent space to encode only the mutual information between the sequences, improving interpretability and robustness. We further introduce a two-step training scheme to bridge the gap between information-theoretic representation learning and generative modeling, along with a residual connection mechanism to enhance training stability. Through experiments on synthetic and medical fMRI data, we demonstrate that InfoDPCCA excels as a tool for representation learning. Code of InfoDPCCA is available at https://github.com/marcusstang/InfoDPCCA.
Problem

Research questions and friction points this paper is trying to address.

Extract shared latent representations from high-dimensional sequential data
Balance representation compression and predictive sufficiency in dynamic CCA
Improve interpretability by isolating mutual information between sequences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Information-theoretic dynamic probabilistic CCA framework
Two-step training scheme for representation learning
Residual connection mechanism enhances training stability
🔎 Similar Papers
No similar papers found.
Shiqin Tang
Shiqin Tang
Center for AI and Robotics, Chinese Academy of Sciences
Machine Learning
Shiqin Tang
Shiqin Tang
Center for AI and Robotics, Chinese Academy of Sciences
Machine Learning
S
Shujian Yu
Dept. of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
S
Shujian Yu
Dept. of Physics and Technology , UiT The Arctic University of Norway, Tromsø, Norway