Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges in ADME property prediction—namely data noise, inter-task dependencies, and limited sample availability—by introducing a molecular graph Transformer-based pretraining framework. The approach unifies chemically informed self-supervision (e.g., SMILES reconstruction), contrastive mutual information maximization (cMIM), and a multi-task GNN readout mechanism under a single probabilistic latent-variable objective, enabling joint optimization across reconstruction, discrimination, and downstream tasks. Crucially, diverse self-supervised signals are modeled as equally weighted probabilistic factors, eliminating manual hyperparameter tuning, while task-specific MLP heads mitigate negative transfer and capture complex nonlinear task relationships. The method outperforms baselines by 7.6%, 9.9%, and 9.5% on Biogen, ExpansionRX, and ChEMBL-MT benchmarks, respectively; incorporating ADME-relevant molecules further enhances transferability, and ablation studies confirm that the proposed components effectively enrich chemical semantic neighborhood representations.
📝 Abstract
Accurate prediction of absorption, distribution, metabolism, and excretion (ADME) properties is critical to drug discovery, but remains challenging because ADME endpoints are noisy, interdependent, and often data-limited. We propose a molecular graph-transformer pretraining framework that combines chemistry-specific self-supervision with contrastive mutual information machine learning (cMIM). Our method encodes molecular graphs into latent variables, reconstructs SMILES strings from the graph-derived latent codes, and augments the contrastive objective with domain-specific self-supervised chemistry tasks. Rather than treating these tasks as auxiliary regularizers with separately tuned loss weights, we formulate reconstruction, contrastive discrimination, and chemistry-specific supervision as unit-weighted log-probability factors in a single probabilistic latent-variable objective. For fine-tuning, we propose a multi-task GNN readout architecture with task-specific multilayer perceptron heads, preserving shared representation learning while mitigating negative transfer and improving the modeling of heterogeneous, nonlinear task relationships. Across Biogen, ExpansionRX, and ChEMBL-MT, the resulting Contrastive KERMT pretraining improves over the KERMT baseline by 7.6%, 9.9%, and 9.5% respectively (averaged over significantly-improved endpoints). Adding ADME-adjacent molecules to the pretraining corpus further improves transfer, and the contrastive component sharpens chemically meaningful latent neighborhoods.
Problem

Research questions and friction points this paper is trying to address.

ADME prediction
multi-task learning
data-limited
noisy endpoints
interdependent properties
Innovation

Methods, ideas, or system contributions that make the work stand out.

contrastive pretraining
probabilistic latent-variable model
multi-task GNN
chemistry-specific self-supervision
ADME property prediction
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid