🤖 AI Summary
Clinical trials suffer from insufficient accuracy in predicting adverse drug events (ADEs) for monotherapies. Method: We introduce CT-ADE, the first multi-label ADE prediction benchmark dataset tailored to monotherapy—comprising 2,497 drugs and 168,984 drug–ADE pairs—and systematically integrate patient-level and treatment-contextual features mapped to the MedDRA hierarchical ontology. Our approach combines structured clinical trial data extraction, multi-label classification modeling, and LLM-based zero-/few-shot evaluation, complemented by ablation studies. Contribution/Results: Incorporating patient and contextual features improves F1 scores by 21–38% over models relying solely on chemical structure or LLM specialization, confirming their critical predictive value. CT-ADE is fully open-sourced and reproducible, establishing a new AI-driven paradigm for drug safety assessment and providing foundational support for precise, proactive ADE risk prediction.
📝 Abstract
Adverse drug events (ADEs) significantly impact clinical research, causing many clinical trial failures. ADE prediction is key for developing safer medications and enhancing patient outcomes. To support this effort, we introduce CT-ADE, a dataset for multilabel predictive modeling of ADEs in monopharmacy treatments. CT-ADE integrates data from 2,497 unique drugs, encompassing 168,984 drug-ADE pairs extracted from clinical trials, annotated with patient and contextual information, and comprehensive ADE concepts standardized across multiple levels of the MedDRA ontology. Preliminary analyses with large language models (LLMs) achieved F1-scores up to 55.90%. Models using patient and contextual information showed F1-score improvements of 21%-38% over models using only chemical structure data. Our results highlight the importance of target population and treatment regimens in the predictive modeling of ADEs, offering greater performance gains than LLM domain specialization and scaling. CT-ADE provides an essential tool for researchers aiming to leverage artificial intelligence and machine learning to enhance patient safety and minimize the impact of ADEs on pharmaceutical research and development. The dataset is publicly accessible at https://github.com/ds4dh/CT-ADE.