ResCap-DBP: A Lightweight Residual-Capsule Network for Accurate DNA-Binding Protein Prediction Using Global ProteinBERT Embeddings

📅 2025-07-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the time-consuming and costly experimental identification of DNA-binding proteins (DBPs), this paper proposes ResCap-DBP, a lightweight residual capsule network that learns discriminative features directly from raw amino acid sequences. Methodologically, it innovatively integrates residual learning, dilated convolutions, and a one-dimensional capsule network: residual learning mitigates gradient vanishing, while the capsule architecture—via dynamic routing—captures high-order spatial relationships among sequence features. Additionally, ProteinBERT-derived global embeddings are incorporated to enhance generalization. Evaluated on benchmark datasets PDB14189 and PDB1075, ResCap-DBP achieves AUC scores of 98.0% and 89.5%, respectively; on independent test sets PDB2272 and PDB186, it attains AUCs exceeding 83.2%. It significantly outperforms state-of-the-art methods, demonstrating superior sensitivity and specificity. This work provides an efficient, robust deep learning solution for accurate DBP prediction under both small- and large-sample scenarios.

Technology Category

Application Category

📝 Abstract
DNA-binding proteins (DBPs) are integral to gene regulation and cellular processes, making their accurate identification essential for understanding biological functions and disease mechanisms. Experimental methods for DBP identification are time-consuming and costly, driving the need for efficient computational prediction techniques. In this study, we propose a novel deep learning framework, ResCap-DBP, that combines a residual learning-based encoder with a one-dimensional Capsule Network (1D-CapsNet) to predict DBPs directly from raw protein sequences. Our architecture incorporates dilated convolutions within residual blocks to mitigate vanishing gradient issues and extract rich sequence features, while capsule layers with dynamic routing capture hierarchical and spatial relationships within the learned feature space. We conducted comprehensive ablation studies comparing global and local embeddings from ProteinBERT and conventional one-hot encoding. Results show that ProteinBERT embeddings substantially outperform other representations on large datasets. Although one-hot encoding showed marginal advantages on smaller datasets, such as PDB186, it struggled to scale effectively. Extensive evaluations on four pairs of publicly available benchmark datasets demonstrate that our model consistently outperforms current state-of-the-art methods. It achieved AUC scores of 98.0% and 89.5% on PDB14189andPDB1075, respectively. On independent test sets PDB2272 and PDB186, the model attained top AUCs of 83.2% and 83.3%, while maintaining competitive performance on larger datasets such as PDB20000. Notably, the model maintains a well balanced sensitivity and specificity across datasets. These results demonstrate the efficacy and generalizability of integrating global protein representations with advanced deep learning architectures for reliable and scalable DBP prediction in diverse genomic contexts.
Problem

Research questions and friction points this paper is trying to address.

Accurate prediction of DNA-binding proteins from sequences
Overcoming limitations of costly experimental DBP identification
Enhancing deep learning for scalable genomic protein analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Residual-Capsule Network for DBP prediction
Uses ProteinBERT embeddings for global features
Dilated convolutions prevent vanishing gradients
Samiul Based Shuvo
Samiul Based Shuvo
Assistant Professor at the Department of BME in Bangladesh University of Engineering and Technology
Deep learningBioinformaticsBiomedical Signal & Image Processing
T
Tasnia Binte Mamun
Department of Biomedical Engineering, Bangladesh University of Engineering and Technology (BUET), Dhaka-1205, Bangladesh
U
U Rajendra Acharya
School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, QLD 4300, Australia