π€ AI Summary
This study addresses pharmacovigilance in oncology by introducing the novel task of βmulti-patient adverse drug event (ADE) grouping summarization for anticancer drugs,β aiming to enhance pharmacoepidemiological decision-making and patient-centered understanding.
Method: We propose a hybrid framework integrating large language model (LLM)-based information extraction with T5βs abstractive summarization capability. To our knowledge, this is the first application of Direct Preference Optimization (DPO) within an encoder-decoder architecture for medical summarization, augmented by synthetic data generation and multi-label structured annotation.
Contribution/Results: We construct MCADRSβthe first high-quality, fully annotated, multi-label ADE dataset for cancer therapeutics. Extensive experiments demonstrate that our method consistently outperforms all baselines in both automated metrics (e.g., ROUGE, BERTScore) and human evaluation, significantly improving clinical relevance, completeness, and readability of summaries. The code and MCADRS dataset are publicly released.
π Abstract
In the realm of cancer treatment, summarizing adverse drug events (ADEs) reported by patients using prescribed drugs is crucial for enhancing pharmacovigilance practices and improving drug-related decision-making. While the volume and complexity of pharmacovigilance data have increased, existing research in this field has predominantly focused on general diseases rather than specifically addressing cancer. This work introduces the task of grouped summarization of adverse drug events reported by multiple patients using the same drug for cancer treatment. To address the challenge of limited resources in cancer pharmacovigilance, we present the MultiLabeled Cancer Adverse Drug Reaction and Summarization (MCADRS) dataset. This dataset includes pharmacovigilance posts detailing patient concerns regarding drug efficacy and adverse effects, along with extracted labels for drug names, adverse drug events, severity, and adversity of reactions, as well as summaries of ADEs for each drug. Additionally, we propose the Grouping and Abstractive Summarization of Cancer Adverse Drug events (GASCADE) framework, a novel pipeline that combines the information extraction capabilities of Large Language Models (LLMs) with the summarization power of the encoder-decoder T5 model. Our work is the first to apply alignment techniques, including advanced algorithms like Direct Preference Optimization, to encoder-decoder models using synthetic datasets for summarization tasks. Through extensive experiments, we demonstrate the superior performance of GASCADE across various metrics, validated through both automated assessments and human evaluations. This multitasking approach enhances drug-related decision-making and fosters a deeper understanding of patient concerns, paving the way for advancements in personalized and responsive cancer care. The code and dataset used in this work are publicly available.