🤖 AI Summary
This study addresses the inefficiency of manual completion of clinical Case Report Forms (CRFs) and the scarcity of annotated data for automated CRF-filling systems. The authors present the first fine-grained annotated dataset of Italian emergency department clinical notes, encompassing 134 CRF fields, and formally define the CRF auto-population task along with evaluation metrics. Leveraging open-source large language models (LLMs), they conduct zero-shot experiments to extract structured CRF information directly from unstructured clinical narratives. Results demonstrate that LLMs can effectively perform this extraction in a zero-shot setting, though they exhibit a conservative bias toward predicting “unknown” responses. This work not only fills a critical gap in annotated Italian clinical text resources but also identifies and quantifies a systematic bias in LLM-based medical information extraction, laying the groundwork for future bias mitigation and system optimization.
📝 Abstract
Case Report Forms (CRFs) collect data about patients and are at the core of well-established practices to conduct research in clinical settings. With the recent progress of language technologies, there is an increasing interest in automatic CRF-filling from clinical notes, mostly based on the use of Large Language Models (LLMs). However, there is a general scarcity of annotated CRF data, both for training and testing LLMs, which limits the progress on this task. As a step in the direction of providing such data, we present a new dataset of clinical notes from an Italian Emergency Department annotated with respect to a pre-defined CRF containing 134 items to be filled. We provide an analysis of the data, define the CRF-filling task and metric for its evaluation, and report on pilot experiments where we use an open-source state-of-the-art LLM to automatically execute the task. Results of the case-study show that (i) CRF-filling from real clinical notes in Italian can be approached in a zero-shot setting; (ii) LLMs' results are affected by biases (e.g., a cautious behaviour favours "unknown" answers), which need to be corrected.