🤖 AI Summary
This work addresses the challenges in ADME property prediction—namely data noise, inter-task dependencies, and limited sample availability—by introducing a molecular graph Transformer-based pretraining framework. The approach unifies chemically informed self-supervision (e.g., SMILES reconstruction), contrastive mutual information maximization (cMIM), and a multi-task GNN readout mechanism under a single probabilistic latent-variable objective, enabling joint optimization across reconstruction, discrimination, and downstream tasks. Crucially, diverse self-supervised signals are modeled as equally weighted probabilistic factors, eliminating manual hyperparameter tuning, while task-specific MLP heads mitigate negative transfer and capture complex nonlinear task relationships. The method outperforms baselines by 7.6%, 9.9%, and 9.5% on Biogen, ExpansionRX, and ChEMBL-MT benchmarks, respectively; incorporating ADME-relevant molecules further enhances transferability, and ablation studies confirm that the proposed components effectively enrich chemical semantic neighborhood representations.
📝 Abstract
Accurate prediction of absorption, distribution, metabolism, and excretion (ADME) properties is critical to drug discovery, but remains challenging because ADME endpoints are noisy, interdependent, and often data-limited. We propose a molecular graph-transformer pretraining framework that combines chemistry-specific self-supervision with contrastive mutual information machine learning (cMIM). Our method encodes molecular graphs into latent variables, reconstructs SMILES strings from the graph-derived latent codes, and augments the contrastive objective with domain-specific self-supervised chemistry tasks. Rather than treating these tasks as auxiliary regularizers with separately tuned loss weights, we formulate reconstruction, contrastive discrimination, and chemistry-specific supervision as unit-weighted log-probability factors in a single probabilistic latent-variable objective. For fine-tuning, we propose a multi-task GNN readout architecture with task-specific multilayer perceptron heads, preserving shared representation learning while mitigating negative transfer and improving the modeling of heterogeneous, nonlinear task relationships. Across Biogen, ExpansionRX, and ChEMBL-MT, the resulting Contrastive KERMT pretraining improves over the KERMT baseline by 7.6%, 9.9%, and 9.5% respectively (averaged over significantly-improved endpoints). Adding ADME-adjacent molecules to the pretraining corpus further improves transfer, and the contrastive component sharpens chemically meaningful latent neighborhoods.