DeepVRegulome: DNABERT-based deep-learning framework for predicting the functional impact of short genomic variants on the human regulome

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Functional annotation of noncoding short variants and their clinical interpretation remain key bottlenecks in genomic medicine due to limited interpretability and accuracy. To address this, we propose a multi-model DNABERT ensemble framework: 700 DNABERT models are fine-tuned on large-scale ENCODE regulatory data, and integrated with variant effect scoring, motif perturbation analysis, attention visualization, and survival association testing—enabling precise, interpretable prioritization of splicing-regulatory and transcription factor binding site (TFBS) mutations. Applied to glioblastoma whole-genome sequencing data, our method identified 572 splice-disrupting variants and 9,837 TFBS-altering variants; among these, 1,352 were significantly associated with overall survival. Furthermore, we constructed a prognostic stratification model based solely on noncoding mutation features. This approach substantially improves both the accuracy and clinical translatability of noncoding variant functional interpretation.

Technology Category

Application Category

📝 Abstract
Whole-genome sequencing (WGS) has revealed numerous non-coding short variants whose functional impacts remain poorly understood. Despite recent advances in deep-learning genomic approaches, accurately predicting and prioritizing clinically relevant mutations in gene regulatory regions remains a major challenge. Here we introduce Deep VRegulome, a deep-learning method for prediction and interpretation of functionally disruptive variants in the human regulome, which combines 700 DNABERT fine-tuned models, trained on vast amounts of ENCODE gene regulatory regions, with variant scoring, motif analysis, attention-based visualization, and survival analysis. We showcase its application on TCGA glioblastoma WGS dataset in prioritizing survival-associated mutations and regulatory regions. The analysis identified 572 splice-disrupting and 9,837 transcription-factor binding site altering mutations occurring in greater than 10% of glioblastoma samples. Survival analysis linked 1352 mutations and 563 disrupted regulatory regions to patient outcomes, enabling stratification via non-coding mutation signatures. All the code, fine-tuned models, and an interactive data portal are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Predicting functional impacts of non-coding genomic variants
Prioritizing clinically relevant mutations in regulatory regions
Identifying survival-associated mutations in glioblastoma datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

DNABERT fine-tuned models predict variant impact
Combines motif analysis with attention visualization
Prioritizes survival-associated non-coding mutations
🔎 Similar Papers
No similar papers found.
P
Pratik Dutta
Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY
M
Matthew Obusan
Renaissance School of Medicine, Stony Brook University, Stony Brook, NY
R
Rekha Sathian
Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY
M
Max Chao
Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY
P
Pallavi Surana
Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY
N
Nimisha Papineni
Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY
Yanrong Ji
Yanrong Ji
Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine
Z
Zhihan Zhou
Department of Computer Science, Northwestern University, Evanston, IL, USA
H
Han Liu
Department of Computer Science, Northwestern University, Evanston, IL, USA
Alisa Yurovsky
Alisa Yurovsky
Stony Brook Universty
Bioinformatics
Ramana V Davuluri
Ramana V Davuluri
Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY