🤖 AI Summary
This study addresses the limited clinical translatability of traditional drug-induced liver injury (DILI) prediction models, which typically rely on opaque binary classification and lack mechanistic interpretability. To overcome this, the work reframes DILI prediction as an interpretable mechanism hypothesis generation task, introducing DILER—the first benchmark dataset annotated with hepatotoxicity mechanism hypotheses—and presenting HADES, an agent-based system that integrates molecular representations, metabolite breakdown, structural similarity, and toxicity pathway evidence to enable transparent, auditable reasoning. Evaluated on the DILER test set, the approach achieves a ROC-AUC of 0.68 (0.59 on the post-2021 subset), significantly outperforming existing models, and establishes the first baseline for mechanism hypothesis generation with a soft Jaccard index of 0.16, thereby pioneering a new paradigm for interpretable DILI prediction.
📝 Abstract
Drug-induced liver injury (DILI) remains a leading cause of late-stage clinical trial attrition. However, existing computational predictors primarily rely on binary classification, a framing that limits generalization and yields no mechanistic insight to guide translational decisions. We argue that DILI prediction is better posed as an explainable hypothesis-generation problem.
To support this shift, we introduce the DILER Benchmark, a dataset that extends beyond binary labels by augmenting a curated set of molecules with mechanistic hepatotoxicity hypotheses derived from biomedical literature. We further present HADES, an agentic system designed to generate transparent and auditable reasoning traces. By combining molecular-level predictions, metabolite decomposition, structural understanding, and toxicity pathway evidence, HADES mechanistically assesses DILI risk.
Evaluated on the DILER Benchmark, HADES outperforms existing models in binary classification, achieving a ROC-AUC of 0.68 on the Test Set and 0.59 on the challenging Post-2021 Set, compared with 0.63 and 0.50 for DILI-Predictor, respectively. More importantly, we establish a baseline for mechanistic hypothesis generation, where HADES achieves a Hypothesis Alignment Fuzzy Jaccard Index of 0.16. This result underscores the inherent complexity of the task while highlighting the need for advanced explainable approaches in predictive toxicology.