Bayesian Joint Additive Factor Models for Multiview Learning

📅 2024-06-02
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in multi-omics and other multi-view data—including difficulty modeling cross-view dependencies, strong signal heterogeneity, and insufficient interpretability and uncertainty quantification—this paper proposes JAFAR, a joint Bayesian factor model. Methodologically, JAFAR introduces the Dependency-Cumulative Shrinkage Prior (D-CUSP), which jointly characterizes shared and view-specific latent factor structures while ensuring parameter identifiability. It integrates Bayesian nonparametrics, structured additive designs, partially collapsed Gibbs sampling, and flexible distributional extensions—accommodating non-Gaussian features and survival outcomes. In an application to preterm birth prediction, JAFAR jointly analyzes immunomic, metabolomic, and proteomic data, achieving statistically significant improvements over state-of-the-art methods. The model enables interpretable feature selection and principled uncertainty quantification. An open-source R package implementing JAFAR is publicly available.

Technology Category

Application Category

📝 Abstract
It is increasingly common in a wide variety of applied settings to collect data of multiple different types on the same set of samples. Our particular focus in this article is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of interest to infer dependence within and across views while combining multimodal information to improve the prediction of outcomes. The signal-to-noise ratio can vary substantially across views, motivating more nuanced statistical tools beyond standard late and early fusion. This challenge comes with the need to preserve interpretability, select features, and obtain accurate uncertainty quantification. We propose a joint additive factor regression model (JAFAR) with a structured additive design, accounting for shared and view-specific components. We ensure identifiability via a novel dependent cumulative shrinkage process (D-CUSP) prior. We provide an efficient implementation via a partially collapsed Gibbs sampler and extend our approach to allow flexible feature and outcome distributions. Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors. Our open-source software (R package) is available at https://github.com/niccoloanceschi/jafar.
Problem

Research questions and friction points this paper is trying to address.

Multi-view Learning
Data Integration
Predictive Modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian collaborative factor model
multi-omics data
predictive uncertainty quantification
🔎 Similar Papers
No similar papers found.
N
Niccolò Anceschi
Department of Statistical Science, Duke University
Federico Ferrari
Federico Ferrari
Senior Wireless Platform Architect – Expert Wireless Protocols, Sonova AG
Wireless embedded systems
D
David B. Dunson
Department of Statistical Science, Duke University
H
Himel Mallick
Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, Cornell University