🤖 AI Summary
To address challenges in multi-omics and other multi-view data—including difficulty modeling cross-view dependencies, strong signal heterogeneity, and insufficient interpretability and uncertainty quantification—this paper proposes JAFAR, a joint Bayesian factor model. Methodologically, JAFAR introduces the Dependency-Cumulative Shrinkage Prior (D-CUSP), which jointly characterizes shared and view-specific latent factor structures while ensuring parameter identifiability. It integrates Bayesian nonparametrics, structured additive designs, partially collapsed Gibbs sampling, and flexible distributional extensions—accommodating non-Gaussian features and survival outcomes. In an application to preterm birth prediction, JAFAR jointly analyzes immunomic, metabolomic, and proteomic data, achieving statistically significant improvements over state-of-the-art methods. The model enables interpretable feature selection and principled uncertainty quantification. An open-source R package implementing JAFAR is publicly available.
📝 Abstract
It is increasingly common in a wide variety of applied settings to collect data of multiple different types on the same set of samples. Our particular focus in this article is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of interest to infer dependence within and across views while combining multimodal information to improve the prediction of outcomes. The signal-to-noise ratio can vary substantially across views, motivating more nuanced statistical tools beyond standard late and early fusion. This challenge comes with the need to preserve interpretability, select features, and obtain accurate uncertainty quantification. We propose a joint additive factor regression model (JAFAR) with a structured additive design, accounting for shared and view-specific components. We ensure identifiability via a novel dependent cumulative shrinkage process (D-CUSP) prior. We provide an efficient implementation via a partially collapsed Gibbs sampler and extend our approach to allow flexible feature and outcome distributions. Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors. Our open-source software (R package) is available at https://github.com/niccoloanceschi/jafar.