Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model

📅 2026-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of automatically extracting structured clinical information from Dutch-language neuroradiology reports, a key bottleneck for large-scale brain MRI research. It presents the first systematic evaluation of the open-source large language model LLaMA 3.1 for this task, introducing a structure-similarity-based strategy for few-shot example selection. By combining multilingual inputs—original Dutch reports and their English translations—with tailored prompt engineering, the approach extracts 30 key variables from 947 reports. The model achieves high accuracy on visual rating scales (e.g., 94% for Fazekas score) and detection of microbleed mentions (93%). Few-shot prompting substantially improves extraction of numeric variables such as microbleed counts (92% accuracy), though localization-related variables remain challenging.
📝 Abstract
Objectives: Automatic data extraction from free-text radiology reports enables large-scale research, but few studies assessed the performance of large language models (LLMs) on Dutch neuroradiology reports. Methods: We analyzed 947 brain MRI reports from a tertiary memory clinic (2016-2021), authored by consultant neuroradiologists. Trained medical students annotated thirty variables; 100 reports were double-annotated to assess inter-rater reliability. We evaluated the performance of the open-weight LLM LLaMA 3.1 using different languages (Dutch vs. English translation) and few-shot prompting with different example selection strategies. Performance was evaluated using balanced accuracy for categorical variables, accuracy and mean absolute error for counts, and text similarity for free-text. Metrics were computed across 10 random splits of the 947 reports. Results: LLaMA 3.1 demonstrated high zero-shot performance for visual rating scores (mean [95%-CI]): Medial Temporal Atrophy: 90% [77-100%] on the left and 96% [94-99%] on the right, Global Cortical Atrophy: 87% [83-91%], and Fazekas: 94% [93-96%]. Microbleed mentions were detected with 93% accuracy [92-95%] and infarct mentions with 82% [80-84%]. Text similarity for lesion location reached 0.95 [0.95-0.96]. Performance was lower for numerical variables: 80% [78-82%] for the number of microbleeds and 66% [63-68%] for infarcts. English translation yielded comparable results. Few-shot prompting improved performance for numerical variables, achieving 92% [90-93%] for microbleeds and 81% [77-85%] for infarcts using structural similarity-based selection. Conclusion: LLaMA 3.1 shows strong potential for extracting data from Dutch neuroradiology reports. Few-shot prompting enhances performance for numerical variables, whereas challenges remain for location-specific variables.
Problem

Research questions and friction points this paper is trying to address.

structured information extraction
brain MRI reports
neuroradiology
large language models
Dutch clinical text
Innovation

Methods, ideas, or system contributions that make the work stand out.

large language model
structured information extraction
neuroradiology reports
few-shot prompting
open-weight model
🔎 Similar Papers
No similar papers found.
K
Kaouther Mouheb
Department of Radiology & Nuclear Medicine, Erasmus MC, Rotterdam, the Netherlands
A
Amos Pomp
Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands
A
Antoine Manenti
Department of Radiology & Nuclear Medicine, Erasmus MC, Rotterdam, the Netherlands; Department of Electrical and Electronics Engineering, ENSEEIHT, Toulouse, France
R
Romy de Haan
Alzheimer Centre Erasmus MC, Erasmus MC, Rotterdam, the Netherlands
F
Farog Faghir
Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands
J
Joy Martens
Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands
H
Harro Seelaar
Alzheimer Centre Erasmus MC, Erasmus MC, Rotterdam, the Netherlands; Department of Neurology, Erasmus MC, Rotterdam, the Netherlands
F
Francesco Mattace-Raso
Alzheimer Centre Erasmus MC, Erasmus MC, Rotterdam, the Netherlands; Department of Internal Medicine, Erasmus MC, Rotterdam, the Netherlands
M
Meike W. Vernooij
Department of Radiology & Nuclear Medicine, Erasmus MC, Rotterdam, the Netherlands; Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands; Alzheimer Centre Erasmus MC, Erasmus MC, Rotterdam, the Netherlands
F
Frank J. Wolters
Department of Radiology & Nuclear Medicine, Erasmus MC, Rotterdam, the Netherlands; Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands; Alzheimer Centre Erasmus MC, Erasmus MC, Rotterdam, the Netherlands
Stefan Klein
Stefan Klein
Biomedical Imaging Group Rotterdam, Erasmus MC, the Netherlands
medical image analysismachine learning
E
Esther E. Bron
Department of Radiology & Nuclear Medicine, Erasmus MC, Rotterdam, the Netherlands