Detection of Adverse Drug Events in Dutch clinical free text documents using Transformer Models: benchmark study

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A reliable benchmark for adverse drug event (ADE) detection in Dutch clinical free-text is lacking. Method: We introduce the first benchmark framework for ADE recognition in Dutch clinical settings, evaluating both end-to-end and two-stage (entity recognition + relation classification) paradigms. We systematically assess Bi-LSTM and four Transformer models—BERTje, RobBERT, MedRoBERTa.nl, and NuNER—using clinically adapted evaluation metrics and rigorous internal validation plus external document-level validation. Contribution/Results: MedRoBERTa.nl achieves a macro-F1 score of 0.63 on internal testing; in external document-level validation, it attains ADE recall of 67–74%, substantially outperforming prior approaches. This work establishes a reproducible, clinically meaningful, standardized benchmark for evaluating low-resource medical language models in Dutch.

Technology Category

Application Category

📝 Abstract
In this study, we set a benchmark for adverse drug event (ADE) detection in Dutch clinical free text documents using several transformer models, clinical scenarios and fit-for-purpose performance measures. We trained a Bidirectional Long Short-Term Memory (Bi-LSTM) model and four transformer-based Dutch and/or multilingual encoder models (BERTje, RobBERT, MedRoBERTa.nl, and NuNER) for the tasks of named entity recognition (NER) and relation classification (RC) using 102 richly annotated Dutch ICU clinical progress notes. Anonymized free text clinical progress notes of patients admitted to intensive care unit (ICU) of one academic hospital and discharge letters of patients admitted to Internal Medicine wards of two non-academic hospitals were reused. We evaluated our ADE RC models internally using gold standard (two-step task) and predicted entities (end-to-end task). In addition, all models were externally validated on detecting ADEs at the document level. We report both micro- and macro-averaged F1 scores, given the imbalance of ADEs in the datasets. Although differences for the ADE RC task between the models were small, MedRoBERTa.nl was the best performing model with macro-averaged F1 score of 0.63 using gold standard and 0.62 using predicted entities. The MedRoBERTa.nl models also performed the best in our external validation and achieved recall of between 0.67 to 0.74 using predicted entities, meaning between 67 to 74% of discharge letters with ADEs were detected. Our benchmark study presents a robust and clinically meaningful approach for evaluating language models for ADE detection in clinical free text documents. Our study highlights the need to use appropriate performance measures fit for the task of ADE detection in clinical free-text documents and envisioned future clinical use.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking ADE detection in Dutch clinical texts using transformers.
Evaluating NER and RC models for ADE identification in ICU notes.
Assessing model performance with clinical measures and external validation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used transformer models for ADE detection
Benchmarked Dutch clinical text processing
Applied MedRoBERTa.nl for best performance
🔎 Similar Papers
No similar papers found.
R
Rachel M. Murphy
Amsterdam UMC location University of Amsterdam, Department of Medical Informatics; Amsterdam Public Health, Digital Health; Amsterdam Public Health, Quality of Care
N
Nishant Mishra
Amsterdam UMC location University of Amsterdam, Department of Medical Informatics; Amsterdam Public Health, Methodology
N
Nicolette F. de Keizer
Amsterdam UMC location University of Amsterdam, Department of Medical Informatics; Amsterdam Public Health, Digital Health; Amsterdam Public Health, Quality of Care
D
Dave A. Dongelmans
Amsterdam UMC location University of Amsterdam, Department of Intensive Care Medicine; Amsterdam Public Health, Quality of Care
K
Kitty J. Jager
Amsterdam UMC location University of Amsterdam, Department of Medical Informatics; Amsterdam Public Health, Quality of Care; Amsterdam Public Health, Aging & Later Life
Ameen Abu-Hanna
Ameen Abu-Hanna
Professor of Medical Informatics
Medical InformaticsArtificial IntelligenceComputer ScienceMedicinee-Health
J
Joanna E. Klopotowska
Amsterdam UMC location University of Amsterdam, Department of Medical Informatics; Amsterdam Public Health, Digital Health; Amsterdam Public Health, Quality of Care
Iacer Calixto
Iacer Calixto
Assistant Professor, AUMC, University of Amsterdam
natural language processingmachine learningmulti-modal learning