Rep3Net: An Approach Exploiting Multimodal Representation for Molecular Bioactivity Prediction

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional QSAR models struggle to jointly capture molecular structural topology and semantic contextual information. To address this, we propose a multimodal deep learning framework for target-specific bioactivity prediction. Our method integrates three heterogeneous representations—handcrafted molecular descriptors, graph-based topological features extracted via GCN or GAT, and SMILES-level semantic embeddings generated by ChemBERTa—through feature concatenation and end-to-end fusion using a deep neural network. Experiments on the PARP-1 dataset demonstrate that our framework significantly outperforms classical QSAR models and unimodal baselines (e.g., GNN-only or ChemBERTa-only variants), achieving superior prediction accuracy and generalizability. To our knowledge, this is the first work to systematically unify chemical structure, relational topology, and sequence semantics within a single, scalable, and high-precision multimodal modeling paradigm, offering substantial potential for early-stage drug discovery and virtual screening.

Technology Category

Application Category

📝 Abstract

In early stage drug discovery, bioactivity prediction of molecules against target proteins plays a crucial role. Trdaitional QSAR models that utilizes molecular descriptor based data often struggles to predict bioactivity of molecules effectively due to its limitation in capturing structural and contextual information embedded within each compound. To address this challenge, we propose Rep3Net, a unified deep learning architecture that not only incorporates descriptor data but also includes spatial and relational information through graph-based represenation of compounds and contextual information through ChemBERTa generated embeddings from SMILES strings. Our model employing multimodal concatenated features produce reliable bioactivity prediction on Poly [ADP-ribose] polymerase 1 (PARP-1) dataset. PARP-1 is a crucial agent in DNA damage repair and has become a significant theraputic target in malignancies that depend on it for survival and growth. A comprehensive analysis and comparison with conventional standalone models including GCN, GAT, XGBoost, etc. demonstrates that our architecture achieves the highest predictive performance. In computational screening of compounds in drug discovery, our architecture provides a scalable framework for bioactivity prediction.

Problem

Research questions and friction points this paper is trying to address.

Predicts molecular bioactivity against target proteins

Addresses limitations of traditional QSAR models

Integrates multimodal molecular representations for accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal deep learning architecture for bioactivity prediction

Incorporates graph-based and ChemBERTa embeddings with descriptors

Unified framework outperforms conventional models like GCN and XGBoost

🔎 Similar Papers

No similar papers found.

Authors to Follow