Annotation-Informed Block-Sparse Bayesian Modeling for cis-Expression Prediction

πŸ“… 2026-05-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

235K/year
πŸ€– AI Summary
This study addresses the limited modeling of local regulatory architecture in existing genotype-based cis-expression prediction methods. To overcome this, the authors propose a block-sparse Bayesian sparse linear mixed model (bsBSLMM), which for the first time integrates the sparse structure of linkage disequilibrium (LD) blocks with transcription start site (TSS) annotation priors within a unified Bayesian framework. By incorporating an LD-block spike-and-slab prior and a TSS-informed SNP inclusion prior, bsBSLMM enhances both biological interpretability and predictive performance. Evaluated on the GEUVADIS dataset and an independent osteoporosis cohort, bsBSLMM substantially increases the number of predictable genes and prediction accuracy, and identifies more disease-associated genes and pathways in transcriptome-wide association studies (TWAS).
πŸ“ Abstract
Genotype-based cis-expression prediction depends on accurately modeling local regulatory architecture. We present block-sparse Bayesian sparse linear mixed model (bsBSLMM), an extension of Bayesian sparse linear mixed model (BSLMM) that incorporates linkage disequilibrium (LD)-block spike-and-slab sparsity and a transcription start site (TSS)-informed SNP inclusion prior. Across 23,098 genes from GEUVADIS European-ancestry lymphoblastoid cell lines, bsBSLMM retained more predictable genes than BSLMM, LASSO, BLUP, TIGAR elastic net, and TIGAR Dirichlet-process regression under matched evaluation criteria. Compared with BSLMM, bsBSLMM improved held-out prediction performance for most shared genes, with gains driven primarily by LD-block sparsity and further enhanced by the TSS-informed prior. Variants selected by bsBSLMM showed stronger enrichment in GM12878 DNase and H3K27ac regulatory regions than variants selected by BSLMM. In transcriptome-wide association study (TWAS) analysis, bsBSLMM recovered established inflammatory bowel disease signals, including IL23R, and identified additional genome-wide significant genes not detected by BSLMM. Independent validation in the Louisiana Osteoporosis Study reproduced the increased prediction yield across ancestries and recovered biologically relevant bone mineral density pathways in downstream TWAS and gene set enrichment analyses. These results demonstrate that incorporating LD-block structure and biologically informed SNP priors improves cis-expression prediction and enhances downstream TWAS discovery.
Problem

Research questions and friction points this paper is trying to address.

cis-expression prediction
linkage disequilibrium
transcription start site
TWAS
regulatory architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

block-sparse Bayesian modeling
LD-block sparsity
TSS-informed prior
cis-expression prediction
transcriptome-wide association study
πŸ”Ž Similar Papers
No similar papers found.
πŸ’Ό Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
L
Lei Huang
School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA
H
Hui Shen
Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA
Kuan-Jui Su
Kuan-Jui Su
Tulane University, Division of Biomedical Informatics and Genomics
Bioinformaticsmachine learningnetwork analysissystem biology
Chuan Qiu
Chuan Qiu
School of Medicine, Tulane University, USA
Biostatistics & Bioinformatics
M
Martha Isabel Gonzalez-Ramirez
Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA
Anqi Liu
Anqi Liu
Tulane University
Human GeneticsComputational BiologyBioinformaticsDeep Learning
Z
Zhe Luo
Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA
Y
Yun Gong
Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA
Yipu Zhang
Yipu Zhang
Tulane University
Computational NeuroscienceBioinformaticsBrain ConnectomicsMulti-modal data fusion
D
Dawei Li
Texas Tech University Health Sciences Center, School of Medicine, Texas Tech University
C
Chaoyang Zhang
School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA
H
Hong-Wen Deng
Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA