🤖 AI Summary
Accurately and efficiently predicting hydration free energies of small molecules remains challenging due to the trade-off between accuracy and computational cost. Method: We propose an implicit-solvent machine-learned potential (MLP) integrated with the solvation free energy path reweighting (ReSolv) framework, trained jointly on experimental hydration free energies and *ab initio* gas-phase data—eliminating the need for explicit solvent configurations. The model employs a graph neural network to represent molecular structure and couples it with a continuum solvation description and multi-source data optimization. Contribution/Results: This work establishes the first *ab initio*-accurate, implicit-solvent MLP for hydration free energy prediction. On the FreeSolv benchmark, it achieves a mean absolute error of 0.23 kcal/mol—comparable to experimental uncertainty—and significantly outperforms conventional explicit-solvent force fields. Moreover, its computational speed exceeds that of explicit-solvent MLPs by four orders of magnitude.
📝 Abstract
Machine learning (ML) potentials are a powerful tool in molecular modeling, enabling ab initio accuracy for comparably small computational costs. Nevertheless, all-atom simulations employing best-performing graph neural network architectures are still too expensive for applications requiring extensive sampling, such as free energy computations. Implicit solvent models could provide the necessary speed-up due to reduced degrees of freedom and faster dynamics. Here, we introduce a Solvation Free Energy Path Reweighting (ReSolv) framework to parameterize an implicit solvent ML potential for small organic molecules that accurately predicts the hydration free energy, an essential parameter in drug design and pollutant modeling. Learning on a combination of experimental hydration free energy data and ab initio data of molecules in vacuum, ReSolv bypasses the need for intractable ab initio data of molecules in an explicit bulk solvent and does not have to resort to less accurate data-generating models. On the FreeSolv dataset, ReSolv achieves a mean absolute error close to average experimental uncertainty, significantly outperforming standard explicit solvent force fields. Compared to the explicit solvent ML potential, ReSolv offers a computational speedup of four orders of magnitude and attains closer agreement with experiments. The presented framework paves the way for deep molecular models that are more accurate yet computationally more cost-effective than classical atomistic models.