Bayesian Variable Selection in Multivariate Regression Under Collinearity in the Design Matrix

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This paper addresses the failure of variable selection in Bayesian multivariate linear regression under strong collinearity and sparse information (weak signals, small sample sizes, high inter-variable correlation) in the design matrix. It demonstrates that jointly estimating regression coefficients and the off-diagonal elements of the error covariance matrix exacerbates estimation bias and degrades predictive performance. To mitigate this, we propose a two-step Bayesian variable selection strategy: first, estimate the mean structure (i.e., regression coefficients) under a diagonal error covariance assumption; second, independently model residual dependence. Simulation studies and empirical analysis on NIR spectroscopy data confirm that the method substantially improves variable selection accuracy, coefficient estimation precision, and out-of-sample prediction in low-information regimes. The key contribution is identifying the “overfitting risk” inherent in full error covariance modeling and establishing that decoupling mean and covariance estimation achieves a favorable trade-off between robustness and statistical efficiency.

Technology Category

Application Category

📝 Abstract

We consider the problem of variable selection in Bayesian multivariate linear regression models, involving multiple response and predictor variables, under multivariate normal errors. In the absence of a known covariance structure, specifying a model with a non-diagonal covariance matrix is appealing. Modeling dependency in the random errors through a non-diagonal covariance matrix is generally expected to lead to improved estimation of the regression coefficients. In this article, we highlight an interesting exception: modeling the dependency in errors can significantly worsen both estimation and prediction. We demonstrate that Bayesian multi-outcome regression models using several popular variable selection priors can suffer from poor estimation properties in low-information settings--such as scenarios with weak signals, high correlation among predictors and responses, and small sample sizes. In such cases, the simultaneous estimation of all unknown parameters in the model becomes difficult when using a non-diagonal covariance matrix. Through simulation studies and a dataset with measurements from NIR spectroscopy, we illustrate that a two-step procedure--estimating the mean and the covariance matrix separately--can provide more accurate estimates in such cases. Thus, a potential solution to avoid the problem altogether is to routinely perform an additional analysis with a diagonal covariance matrix, even if the errors are expected to be correlated.

Problem

Research questions and friction points this paper is trying to address.

Bayesian variable selection in multivariate regression with collinearity

Poor estimation in low-information settings with non-diagonal covariance

Two-step procedure improves accuracy by separating mean and covariance estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian variable selection in multivariate regression

Two-step mean and covariance estimation

Diagonal covariance matrix for correlated errors

🔎 Similar Papers

No similar papers found.