Toward Automatic Filling of Case Report Forms: A Case Study on Data from an Italian Emergency Department

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the inefficiency of manual completion of clinical Case Report Forms (CRFs) and the scarcity of annotated data for automated CRF-filling systems. The authors present the first fine-grained annotated dataset of Italian emergency department clinical notes, encompassing 134 CRF fields, and formally define the CRF auto-population task along with evaluation metrics. Leveraging open-source large language models (LLMs), they conduct zero-shot experiments to extract structured CRF information directly from unstructured clinical narratives. Results demonstrate that LLMs can effectively perform this extraction in a zero-shot setting, though they exhibit a conservative bias toward predicting “unknown” responses. This work not only fills a critical gap in annotated Italian clinical text resources but also identifies and quantifies a systematic bias in LLM-based medical information extraction, laying the groundwork for future bias mitigation and system optimization.

Technology Category

Application Category

📝 Abstract
Case Report Forms (CRFs) collect data about patients and are at the core of well-established practices to conduct research in clinical settings. With the recent progress of language technologies, there is an increasing interest in automatic CRF-filling from clinical notes, mostly based on the use of Large Language Models (LLMs). However, there is a general scarcity of annotated CRF data, both for training and testing LLMs, which limits the progress on this task. As a step in the direction of providing such data, we present a new dataset of clinical notes from an Italian Emergency Department annotated with respect to a pre-defined CRF containing 134 items to be filled. We provide an analysis of the data, define the CRF-filling task and metric for its evaluation, and report on pilot experiments where we use an open-source state-of-the-art LLM to automatically execute the task. Results of the case-study show that (i) CRF-filling from real clinical notes in Italian can be approached in a zero-shot setting; (ii) LLMs' results are affected by biases (e.g., a cautious behaviour favours "unknown" answers), which need to be corrected.
Problem

Research questions and friction points this paper is trying to address.

Case Report Forms
clinical notes
automatic data extraction
annotated dataset
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Case Report Form (CRF) filling
Large Language Models (LLMs)
clinical notes
zero-shot learning
annotated dataset
🔎 Similar Papers
No similar papers found.
G
Gabriela Anna Kaczmarek
Fondazione Bruno Kessler, Povo, Trento, Italy
Pietro Ferrazzi
Pietro Ferrazzi
Fondazione Bruno Kessler - University of Padova
Natural Language Processing
L
Lorenzo Porta
Emergency Medicine Unit, Fatebenefratelli Hospital, Milan, Italy
V
Vicky Rubini
University of Milan, Milan, Italy
Bernardo Magnini
Bernardo Magnini
Researcher, Fondazone Bruno Kessler - FBK, Trento, Italy
Intelligenza ArtificialeComputational Linguistics