🤖 AI Summary
This work addresses the high cost and low efficiency of manual annotation in radiology reports, particularly for uncertain findings. The authors propose RadAnnotate, a novel framework that integrates retrieval-augmented generation (RAG) for synthetic report creation with entity-level confidence threshold learning to enable efficient automatic annotation of anatomical structures and observations. The system intelligently identifies low-confidence samples requiring expert review, substantially reducing annotation burden. In low-resource settings, the approach improves the F1 score for uncertain observations from 0.61 to 0.70. RadAnnotate automatically processes 55%–90% of reports with entity matching scores ranging from 0.86 to 0.92, and models trained on its synthetic data achieve performance within only 1–2 F1 points of those trained on gold-standard annotations.
📝 Abstract
Radiology report annotation is essential for clinical NLP, yet manual labeling is slow and costly. We present RadAnnotate, an LLM-based framework that studies retrieval-augmented synthetic reports and confidence-based selective automation to reduce expert effort for labeling in RadGraph. We study RadGraph-style entity labeling (graph nodes) and leave relation extraction (edges) to future work. First, we train entity-specific classifiers on gold-standard reports and characterize their strengths and failure modes across anatomy and observation categories, with uncertain observations hardest to learn. Second, we generate RAG-guided synthetic reports and show that synthetic-only models remain within 1-2 F1 points of gold-trained models, and that synthetic augmentation is especially helpful for uncertain observations in a low-resource setting, improving F1 from 0.61 to 0.70. Finally, by learning entity-specific confidence thresholds, RadAnnotate can automatically annotate 55-90% of reports at 0.86-0.92 entity match score while routing low-confidence cases for expert review.