🤖 AI Summary
This work addresses the under-resourced yet clinically critical task of adverse event (AE) identification in discharge summaries for elderly patients. We introduce the first fine-grained, manually annotated corpus specifically designed for this population, covering 14 AE types and attributes including negation, diagnosis type, and in-hospital occurrence, with support for discontinuous and overlapping entity annotations. Methodologically, we propose a novel three-tier evaluation framework—span-level (fine-grained entities), category-level (coarse-grained AE types), and negation-aware detection—and implement sequence labeling and document classification using FlairNLP and BERT-cased. Experiments show strong document-level AE detection performance (F1 = 0.943), but significantly lower span-level F1 (0.675), revealing persistent challenges in rare-AE recognition and modeling complex clinical language. This work fills a critical gap in geriatric NLP resources and establishes a new benchmark and methodological paradigm for fine-grained clinical event extraction.
📝 Abstract
In this work, we present a manually annotated corpus for Adverse Event (AE) extraction from discharge summaries of elderly patients, a population often underrepresented in clinical NLP resources. The dataset includes 14 clinically significant AEs-such as falls, delirium, and intracranial haemorrhage, along with contextual attributes like negation, diagnosis type, and in-hospital occurrence. Uniquely, the annotation schema supports both discontinuous and overlapping entities, addressing challenges rarely tackled in prior work. We evaluate multiple models using FlairNLP across three annotation granularities: fine-grained, coarse-grained, and coarse-grained with negation. While transformer-based models (e.g., BERT-cased) achieve strong performance on document-level coarse-grained extraction (F1 = 0.943), performance drops notably for fine-grained entity-level tasks (e.g., F1 = 0.675), particularly for rare events and complex attributes. These results demonstrate that despite high-level scores, significant challenges remain in detecting underrepresented AEs and capturing nuanced clinical language. Developed within a Trusted Research Environment (TRE), the dataset is available upon request via DataLoch and serves as a robust benchmark for evaluating AE extraction methods and supporting future cross-dataset generalisation.