DeepPNI: Language- and graph-based model for mutation-driven protein-nucleic acid energetics

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This study addresses the challenge of accurately predicting the impact of amino acid mutations on binding free energy changes (ΔΔG) in protein–DNA/RNA complexes. Existing methods suffer from limited generalizability and fail to jointly leverage structural and evolutionary sequence information. To overcome these limitations, we propose the first unified deep learning framework: a coupled edge-aware relational graph convolutional network (RGCN) to model 3D interaction structures, integrated with ESM-2 for extracting evolutionarily conserved protein sequence representations. Evaluated on a large-scale dataset of 1,951 mutations via five-fold cross-validation, our model achieves a mean Pearson correlation coefficient of 0.76. In external benchmarking, it significantly outperforms state-of-the-art tools and demonstrates strong generalization across diverse complex types and experimental temperatures. The framework establishes a new paradigm for computational prediction of mutation effects on nucleic acid–protein binding affinity.

Technology Category

Application Category

📝 Abstract
The interaction between proteins and nucleic acids is crucial for processes that sustain cellular function, including DNA maintenance and the regulation of gene expression and translation. Amino acid mutations in protein-nucleic acid complexes often lead to vital diseases. Experimental techniques have their own specific limitations in predicting mutational effects in protein-nucleic acid complexes. In this study, we compiled a large dataset of 1951 mutations including both protein-DNA and protein-RNA complexes and integrated structural and sequential features to build a deep learning-based regression model named DeepPNI. This model estimates mutation-induced binding free energy changes in protein-nucleic acid complexes. The structural features are encoded via edge-aware RGCN and the sequential features are extracted using protein language model ESM-2. We have achieved a high average Pearson correlation coefficient (PCC) of 0.76 in the large dataset via five-fold cross-validation. Consistent performance across individual dataset of protein-DNA, protein-RNA complexes, and different experimental temperature split dataset make the model generalizable. Our model showed good performance in complex-based five-fold cross-validation, which proved its robustness. In addition, DeepPNI outperformed in external dataset validation, and comparison with existing tools
Problem

Research questions and friction points this paper is trying to address.

Predicts mutation effects on protein-nucleic acid binding energy
Integrates structural and sequential features via deep learning
Addresses experimental limitations in studying protein-DNA/RNA complexes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates structural and sequential features via deep learning
Encodes structural features using edge-aware RGCN
Extracts sequential features with protein language model ESM-2
🔎 Similar Papers
No similar papers found.
S
Somnath Mondal
Department of Chemistry, Indian Institute of Technology Bhilai, Durg 491002 , Chhattisgarh, India.
T
Tinkal Mondal
Department of Bioscience and Biomedical Engineering, Indian Institute of Technology Bhilai, Durg 491002 , Chhattisgarh, India.
Soumajit Pramanik
Soumajit Pramanik
Assistant Professor, IIT Bhilai
Information RetrievalMachine LearningComplex NetworksSocial Computing
R
Rukmankesh Mehra
Department of Chemistry, Indian Institute of Technology Bhilai, Durg 491002 , Chhattisgarh, India. Department of Bioscience and Biomedical Engineering, Indian Institute of Technology Bhilai, Durg 491002 , Chhattisgarh, India.