Covariate-moderated Empirical Bayes Matrix Factorization

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing matrix factorization (MF) methods struggle to effectively integrate heterogeneous auxiliary information—such as images, text, and graphs—and rely on strong parametric assumptions about prior distributions. To address this, we propose cEBMF, the first modular empirical Bayesian matrix factorization framework. cEBMF supports arbitrary modelable side information, automatically learns flexible, data-driven probabilistic priors, and seamlessly unifies variational autoencoding with empirical Bayesian inference—while remaining compatible with both neural networks and classical statistical models. Crucially, it eliminates the need for prespecified prior forms, significantly improving low-rank structure estimation accuracy. Experiments on spatial transcriptomics and MovieLens datasets demonstrate that cEBMF achieves superior robustness and generalization compared to state-of-the-art methods. By enabling principled, scalable integration of multi-source heterogeneous covariates, cEBMF establishes a general, extensible paradigm for Bayesian matrix factorization under complex auxiliary information.

Technology Category

Application Category

📝 Abstract
Matrix factorization is a fundamental method in statistics and machine learning for inferring and summarizing structure in multivariate data. Modern data sets often come with ``side information'' of various forms (images, text, graphs) that can be leveraged to improve estimation of the underlying structure. However, existing methods that leverage side information are limited in the types of data they can incorporate, and they assume specific parametric models. Here, we introduce a novel method for this problem, covariate-moderated empirical Bayes matrix factorization (cEBMF). cEBMF is a modular framework that accepts any type of side information that is processable by a probabilistic model or neural network. The cEBMF framework can accommodate different assumptions and constraints on the factors through the use of different priors, and it adapts these priors to the data. We demonstrate the benefits of cEBMF in simulations and in analyses of spatial transcriptomics and MovieLens data.
Problem

Research questions and friction points this paper is trying to address.

Incorporates diverse side information for matrix factorization
Adapts priors to data without parametric assumptions
Enhances structure estimation in multivariate data analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework for side information integration
Adapts priors to data using empirical Bayes
Accommodates various constraints through flexible priors
🔎 Similar Papers
No similar papers found.
W
William R. P. Denault
Departments of Statistics and Human Genetics, University of Chicago
Karl Tayeb
Karl Tayeb
University of Chicago
Peter Carbonetto
Peter Carbonetto
University of Chicago
Quantitative genetics
J
Jason Willwerscheid
Mathematics and Computer Science, Providence College
Matthew Stephens
Matthew Stephens
University of Chicago
StatisticsGenetics