TFBS-Finder: Deep Learning-based Model with DNABERT and Convolutional Networks to Predict Transcription Factor Binding Sites

๐Ÿ“… 2025-02-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the limited accuracy of transcription factor binding site (TFBS) prediction by proposing an end-to-end deep learning model that integrates DNABERT-derived sequence embeddings with a novel multi-scale convolutional attention (MSCA) module and an enhanced channelโ€“spatial attention mechanism (MCBAM). The architecture jointly captures long-range dependencies and local discriminative features, substantially improving the reliability of functional interpretation of promoter-region DNA sequences. Evaluated on 165 ENCODE ChIP-seq datasets, the model achieves state-of-the-art performance with robust cross-cell-line generalization. Ablation studies confirm the effectiveness of each component. All code and data are publicly released. The core innovation lies in the first deep integration of DNABERT with a dual-attention convolutional framework, yielding a high-accuracy, interpretable computational tool for gene regulatory analysis.

Technology Category

Application Category

๐Ÿ“ Abstract
Transcription factors are proteins that regulate the expression of genes by binding to specific genomic regions known as Transcription Factor Binding Sites (TFBSs), typically located in the promoter regions of those genes. Accurate prediction of these binding sites is essential for understanding the complex gene regulatory networks underlying various cellular functions. In this regard, many deep learning models have been developed for such prediction, but there is still scope of improvement. In this work, we have developed a deep learning model which uses pre-trained DNABERT, a Convolutional Neural Network (CNN) module, a Modified Convolutional Block Attention Module (MCBAM), a Multi-Scale Convolutions with Attention (MSCA) module and an output module. The pre-trained DNABERT is used for sequence embedding, thereby capturing the long-term dependencies in the DNA sequences while the CNN, MCBAM and MSCA modules are useful in extracting higher-order local features. TFBS-Finder is trained and tested on 165 ENCODE ChIP-seq datasets. We have also performed ablation studies as well as cross-cell line validations and comparisons with other models. The experimental results show the superiority of the proposed method in predicting TFBSs compared to the existing methodologies. The codes and the relevant datasets are publicly available at https://github.com/NimishaGhosh/TFBS-Finder/.
Problem

Research questions and friction points this paper is trying to address.

Transcription Factor Binding Sites
Gene Recognition
Cell Function Understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

TFBS-Finder
DNABERT
Convolutional Neural Network (CNN)
๐Ÿ”Ž Similar Papers
No similar papers found.
Nimisha Ghosh
Nimisha Ghosh
Shiv Nadar University, Chennai
Deep LearningMachine LearningComputational BiologyWireless Sensor Network
P
Pratik Dutta
Department of Computer Science and Engineering, Siksha โ€˜Oโ€™ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
Daniele Santoni
Daniele Santoni
IASI - CNR
Computational BiologyBioinformaticsImmunoinformatics