🤖 AI Summary
Accurate prediction of protein–ligand binding affinity remains challenging in gastrointestinal disease (e.g., gastric ulcers, Crohn’s disease, ulcerative colitis) drug and vaccine development. Method: We propose the first bimodal deep learning framework integrating both protein–ligand structural data and disease-associated gene sequences. It employs Graph Isomorphism Networks (GIN) to encode molecular graph structures and leverages pre-trained protein language models (ProtBERT/ESM) to represent pathogenic gene sequences, augmented by a cross-modal interaction module for joint structural–sequential modeling. Contribution/Results: Evaluated on a gastrointestinal disease target dataset, our framework achieves a mean absolute error (MAE) of 1.12 and root-mean-square error (RMSE) of 1.75—substantially outperforming unimodal baselines (e.g., CNN, BiLSTM). This advance establishes a mechanism-informed paradigm for precision drug discovery.
📝 Abstract
Accurate prediction of protein-ligand binding affinity plays a pivotal role in accelerating the discovery of novel drugs and vaccines, particularly for gastrointestinal (GI) diseases such as gastric ulcers, Crohn's disease, and ulcerative colitis. Traditional computational models often rely on structural information alone and thus fail to capture the genetic determinants that influence disease mechanisms and therapeutic responses. To address this gap, we propose GastroDL-Fusion, a dual-modal deep learning framework that integrates protein-ligand complex data with disease-associated gene sequence information for drug and vaccine development. In our approach, protein-ligand complexes are represented as molecular graphs and modeled using a Graph Isomorphism Network (GIN), while gene sequences are encoded into biologically meaningful embeddings via a pre-trained Transformer (ProtBERT/ESM). These complementary modalities are fused through a multi-layer perceptron to enable robust cross-modal interaction learning. We evaluate the model on benchmark datasets of GI disease-related targets, demonstrating that GastroDL-Fusion significantly improves predictive performance over conventional methods. Specifically, the model achieves a mean absolute error (MAE) of 1.12 and a root mean square error (RMSE) of 1.75, outperforming CNN, BiLSTM, GIN, and Transformer-only baselines. These results confirm that incorporating both structural and genetic features yields more accurate predictions of binding affinities, providing a reliable computational tool for accelerating the design of targeted therapies and vaccines in the context of gastrointestinal diseases.