RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high cost and low efficiency of manual annotation in radiology reports, particularly for uncertain findings. The authors propose RadAnnotate, a novel framework that integrates retrieval-augmented generation (RAG) for synthetic report creation with entity-level confidence threshold learning to enable efficient automatic annotation of anatomical structures and observations. The system intelligently identifies low-confidence samples requiring expert review, substantially reducing annotation burden. In low-resource settings, the approach improves the F1 score for uncertain observations from 0.61 to 0.70. RadAnnotate automatically processes 55%–90% of reports with entity matching scores ranging from 0.86 to 0.92, and models trained on its synthetic data achieve performance within only 1–2 F1 points of those trained on gold-standard annotations.

Technology Category

Application Category

📝 Abstract
Radiology report annotation is essential for clinical NLP, yet manual labeling is slow and costly. We present RadAnnotate, an LLM-based framework that studies retrieval-augmented synthetic reports and confidence-based selective automation to reduce expert effort for labeling in RadGraph. We study RadGraph-style entity labeling (graph nodes) and leave relation extraction (edges) to future work. First, we train entity-specific classifiers on gold-standard reports and characterize their strengths and failure modes across anatomy and observation categories, with uncertain observations hardest to learn. Second, we generate RAG-guided synthetic reports and show that synthetic-only models remain within 1-2 F1 points of gold-trained models, and that synthetic augmentation is especially helpful for uncertain observations in a low-resource setting, improving F1 from 0.61 to 0.70. Finally, by learning entity-specific confidence thresholds, RadAnnotate can automatically annotate 55-90% of reports at 0.86-0.92 entity match score while routing low-confidence cases for expert review.
Problem

Research questions and friction points this paper is trying to address.

Radiology report annotation
Clinical NLP
Manual labeling
Entity labeling
Expert effort
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Retrieval-Augmented Generation
Synthetic Data Augmentation
Confidence-Based Automation
Radiology Report Annotation
🔎 Similar Papers
No similar papers found.
S
Saisha Pradeep Shetty
Department of Computer Science, University of California, Davis, CA, USA
R
Roger Eric Goldman
Department of Radiology, University of California, Davis, CA, USA
Vladimir Filkov
Vladimir Filkov
Professor of Computer Science, UC Davis
AI/MLData ScienceAI in HealthSoftware Engineering