🤖 AI Summary
In drug discovery, graph neural network (GNN)-based virtual screening lacks molecule-level interpretability, hindering mechanistic understanding and rational design. To address this, we propose the Hierarchical Grad-CAM Explanation (HGE) framework—a novel hierarchical attribution method that achieves decoupled, three-tier interpretability at the atomic, ring, and whole-molecule levels, explicitly leveraging GNN message-passing dynamics to quantify substructural importance. Evaluated on 20 kinase targets, HGE-enhanced GNNs achieve state-of-the-art virtual screening performance. Crucially, HGE attributions align closely with literature-reported drug–target interaction motifs—successfully recapitulating key binding fragments, including hinge-region hydrogen-bonding groups and hydrophobic pocket substituents. This work establishes a verifiable, multi-scale attribution paradigm for trustworthy GNN deployment in drug discovery.
📝 Abstract
Background: Virtual Screening (VS) has become an essential tool in drug discovery, enabling the rapid and cost-effective identification of potential bioactive molecules. Among recent advancements, Graph Neural Networks (GNNs) have gained prominence for their ability to model complex molecular structures using graph-based representations. However, the integration of explainable methods to elucidate the specific contributions of molecular substructures to biological activity remains a significant challenge. This limitation hampers both the interpretability of predictive models and the rational design of novel therapeutics.\ Results: We trained 20 GNN models on a dataset of small molecules with the goal of predicting their activity on 20 distinct protein targets from the Kinase family. These classifiers achieved state-of-the-art performance in virtual screening tasks, demonstrating high accuracy and robustness on different targets. Building upon these models, we implemented the Hierarchical Grad-CAM graph Explainer (HGE) framework, enabling an in-depth analysis of the molecular moieties driving protein-ligand binding stabilization. HGE exploits Grad-CAM explanations at the atom, ring, and whole-molecule levels, leveraging the message-passing mechanism to highlight the most relevant chemical moieties. Validation against experimental data from the literature confirmed the ability of the explainer to recognize a molecular pattern of drugs and correctly annotate them to the known target. Conclusion: Our approach may represent a valid support to shorten both the screening and the hit discovery process. Detailed knowledge of the molecular substructures that play a role in the binding process can help the computational chemist to gain insights into the structure optimization, as well as in drug repurposing tasks.