🤖 AI Summary
Existing methods struggle to accurately model solvent-dependent protein–ligand conformational changes and lack multi-task collaborative learning capabilities. To address this, we propose a solvent-aware multi-task learning framework: first, constructing molecular conformation ensembles under diverse solvent conditions and learning solvent-invariant representations via contrastive learning; second, jointly pretraining on auxiliary tasks—including molecular reconstruction, atomic distance prediction, and contrastive learning—while integrating heterogeneous solvent-environment data. Our method significantly improves cross-scenario generalization: achieving a 3.7% relative improvement in binding affinity prediction, an 82% success rate on the PoseBusters Astex benchmark, a virtual screening AUC of 97.1%, and a best-docking RMSD of 0.157 Å. The core contribution lies in the first unified integration of explicit solvent modeling, flexible conformational representation, and multi-task pretraining within the drug discovery paradigm.
📝 Abstract
Accurate prediction of protein-ligand interactions is essential for computer-aided drug discovery. However, existing methods often fail to capture solvent-dependent conformational changes and lack the ability to jointly learn multiple related tasks. To address these limitations, we introduce a pre-training method that incorporates ligand conformational ensembles generated under diverse solvent conditions as augmented input. This design enables the model to learn both structural flexibility and environmental context in a unified manner. The training process integrates molecular reconstruction to capture local geometry, interatomic distance prediction to model spatial relationships, and contrastive learning to build solvent-invariant molecular representations. Together, these components lead to significant improvements, including a 3.7% gain in binding affinity prediction, an 82% success rate on the PoseBusters Astex docking benchmarks, and an area under the curve of 97.1% in virtual screening. The framework supports solvent-aware, multi-task modeling and produces consistent results across benchmarks. A case study further demonstrates sub-angstrom docking accuracy with a root-mean-square deviation of 0.157 angstroms, offering atomic-level insight into binding mechanisms and advancing structure-based drug design.