🤖 AI Summary
In high-dimensional linear regression with multi-source transfer learning, existing methods face challenges of negative transfer and the trade-off between information sharing and regularization. To address these issues, this paper proposes BLAST, a Bayesian adaptive transfer learning framework. Its core innovations include: (i) a coupled global-local shrinkage prior integrated with a Bayesian source selection mechanism, enabling adaptive source weighting, automatic elimination of irrelevant sources, and filtering of biased information; and (ii) Bayesian model averaging inference that jointly performs source selection and sparse regression, ensuring computational tractability while improving posterior accuracy and uncertainty quantification. Theoretical analysis and empirical evaluation on TCGA tumor mutational burden estimation demonstrate that BLAST significantly outperforms single-source regularized methods and state-of-the-art transfer learning baselines in both predictive performance and uncertainty calibration.
📝 Abstract
We introduce BLAST, Bayesian Linear regression with Adaptive Shrinkage for Transfer, a Bayesian multi-source transfer learning framework for high-dimensional linear regression. The proposed analytical framework leverages global-local shrinkage priors together with Bayesian source selection to balance information sharing and regularization. We show how Bayesian source selection allows for the extraction of the most useful data sources, while discounting biasing information that may lead to negative transfer. In this framework, both source selection and sparse regression are jointly accounted for in prediction and inference via Bayesian model averaging. The structure of our model admits efficient posterior simulation via a Gibbs sampling algorithm allowing full posterior inference for the target regression coefficients, making BLAST both computationally practical and inferentially straightforward. Our method achieves more accurate posterior inference for the target than regularization approaches based on target data alone, while offering competitive predictive performance and superior uncertainty quantification compared to current state-of-the-art transfer learning methods. We validate its effectiveness through extensive simulation studies and illustrate its analytical properties when applied to a case study on the estimation of tumor mutational burden from gene expression, using data from The Cancer Genome Atlas (TCGA).