DeepPNI: Language- and graph-based model for mutation-driven protein-nucleic acid energetics

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This study addresses the challenge of accurately predicting the impact of amino acid mutations on binding free energy changes (ΔΔG) in protein–DNA/RNA complexes. Existing methods suffer from limited generalizability and fail to jointly leverage structural and evolutionary sequence information. To overcome these limitations, we propose the first unified deep learning framework: a coupled edge-aware relational graph convolutional network (RGCN) to model 3D interaction structures, integrated with ESM-2 for extracting evolutionarily conserved protein sequence representations. Evaluated on a large-scale dataset of 1,951 mutations via five-fold cross-validation, our model achieves a mean Pearson correlation coefficient of 0.76. In external benchmarking, it significantly outperforms state-of-the-art tools and demonstrates strong generalization across diverse complex types and experimental temperatures. The framework establishes a new paradigm for computational prediction of mutation effects on nucleic acid–protein binding affinity.

Technology Category

Application Category

📝 Abstract

The interaction between proteins and nucleic acids is crucial for processes that sustain cellular function, including DNA maintenance and the regulation of gene expression and translation. Amino acid mutations in protein-nucleic acid complexes often lead to vital diseases. Experimental techniques have their own specific limitations in predicting mutational effects in protein-nucleic acid complexes. In this study, we compiled a large dataset of 1951 mutations including both protein-DNA and protein-RNA complexes and integrated structural and sequential features to build a deep learning-based regression model named DeepPNI. This model estimates mutation-induced binding free energy changes in protein-nucleic acid complexes. The structural features are encoded via edge-aware RGCN and the sequential features are extracted using protein language model ESM-2. We have achieved a high average Pearson correlation coefficient (PCC) of 0.76 in the large dataset via five-fold cross-validation. Consistent performance across individual dataset of protein-DNA, protein-RNA complexes, and different experimental temperature split dataset make the model generalizable. Our model showed good performance in complex-based five-fold cross-validation, which proved its robustness. In addition, DeepPNI outperformed in external dataset validation, and comparison with existing tools

Problem

Research questions and friction points this paper is trying to address.

Predicts mutation effects on protein-nucleic acid binding energy

Integrates structural and sequential features via deep learning

Addresses experimental limitations in studying protein-DNA/RNA complexes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates structural and sequential features via deep learning

Encodes structural features using edge-aware RGCN

Extracts sequential features with protein language model ESM-2

🔎 Similar Papers

No similar papers found.