A Multi-Pass Large Language Model Framework for Precise and Efficient Radiology Report Error Detection

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit low positive predictive value (PPV) in radiology report error detection, leading to high manual review costs. Method: We propose a novel three-stage LLM framework—Extractor → Detector → False-Positive Validator—that explicitly incorporates false-positive verification into the detection pipeline for the first time. Trained and externally validated on MIMIC-III, CheXpert, and Open-i datasets, the framework is rigorously evaluated using cluster bootstrap resampling and statistical hypothesis testing. Results: Our approach significantly improves PPV from 6.3% to 15.9%, reduces operational cost per 1,000 reports to $5.58 (a 42.6% decrease), and cuts manual review volume by 54.2%. External validation confirms robust generalizability. The core contribution lies in a transparent, multi-stage architecture that effectively balances sensitivity and precision, markedly enhancing clinical deployability.

Technology Category

Application Category

📝 Abstract
Background: The positive predictive value (PPV) of large language model (LLM)-based proofreading for radiology reports is limited due to the low error prevalence. Purpose: To assess whether a three-pass LLM framework enhances PPV and reduces operational costs compared with baseline approaches. Materials and Methods: A retrospective analysis was performed on 1,000 consecutive radiology reports (250 each: radiography, ultrasonography, CT, MRI) from the MIMIC-III database. Two external datasets (CheXpert and Open-i) were validation sets. Three LLM frameworks were tested: (1) single-prompt detector; (2) extractor plus detector; and (3) extractor, detector, and false-positive verifier. Precision was measured by PPV and absolute true positive rate (aTPR). Efficiency was calculated from model inference charges and reviewer remuneration. Statistical significance was tested using cluster bootstrap, exact McNemar tests, and Holm-Bonferroni correction. Results: Framework PPV increased from 0.063 (95% CI, 0.036-0.101, Framework 1) to 0.079 (0.049-0.118, Framework 2), and significantly to 0.159 (0.090-0.252, Framework 3; P<.001 vs. baselines). aTPR remained stable (0.012-0.014; P>=.84). Operational costs per 1,000 reports dropped to USD 5.58 (Framework 3) from USD 9.72 (Framework 1) and USD 6.85 (Framework 2), reflecting reductions of 42.6% and 18.5%, respectively. Human-reviewed reports decreased from 192 to 88. External validation supported Framework 3's superior PPV (CheXpert 0.133, Open-i 0.105) and stable aTPR (0.007). Conclusion: A three-pass LLM framework significantly enhanced PPV and reduced operational costs, maintaining detection performance, providing an effective strategy for AI-assisted radiology report quality assurance.
Problem

Research questions and friction points this paper is trying to address.

Enhancing PPV in radiology report error detection
Reducing operational costs in LLM-based proofreading
Maintaining detection performance with a three-pass LLM framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-pass LLM framework enhances precision
Extractor, detector, verifier reduce operational costs
Stable detection performance with improved PPV
🔎 Similar Papers
No similar papers found.
Songsoo Kim
Songsoo Kim
Yonsei University College of Medicine
RadiologyMedical AI
S
Seungtae Lee
Department of Radiology, Yonsei University College of Medicine, Seoul, Republic of Korea
S
See Young Lee
Department of Internal Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
J
Joonho Kim
Department of Neurology, Yonsei University College of Medicine, Seoul, Republic of Korea
K
Keechan Kan
Department of Surgery, Samsung Medical Center, Seoul, Republic of Korea
Dukyong Yoon
Dukyong Yoon
Department of Biomedical Systems Informatics, Yonsei University College of Medicine
Medical informaticsBio-signal dataArtificial intelligence in medicine