Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation

📅 2024-12-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

Factuality errors caused by model hallucination remain a critical bottleneck in radiology report generation (RRG). Method: We propose a non-intrusive, model-agnostic uncertainty quantification framework driven by semantic consistency—requiring no model modification or internal state access—and supporting both report-level and sentence-level uncertainty estimation. Our approach integrates contrastive consistency modeling via multi-view semantic alignment, triplet-based verification across reports, images, and structured knowledge, and threshold-adaptive confidence calibration. Contribution/Results: Experiments on MIMIC-CXR and RadAlbum demonstrate that rejecting the top 20% most uncertain reports improves factuality scores by 10%; sentence-level hallucination localization achieves 82.9% accuracy. To our knowledge, this is the first plug-and-play, high-precision, fine-grained factuality assurance framework for RRG, establishing a new paradigm for clinically trustworthy AI.

Technology Category

Application Category

📝 Abstract

Radiology report generation (RRG) has shown great potential in assisting radiologists by automating the labor-intensive task of report writing. While recent advancements have improved the quality and coherence of generated reports, ensuring their factual correctness remains a critical challenge. Although generative medical Vision Large Language Models (VLLMs) have been proposed to address this issue, these models are prone to hallucinations and can produce inaccurate diagnostic information. To address these concerns, we introduce a novel Semantic Consistency-Based Uncertainty Quantification framework that provides both report-level and sentence-level uncertainties. Unlike existing approaches, our method does not require modifications to the underlying model or access to its inner state, such as output token logits, thus serving as a plug-and-play module that can be seamlessly integrated with state-of-the-art models. Extensive experiments demonstrate the efficacy of our method in detecting hallucinations and enhancing the factual accuracy of automatically generated radiology reports. By abstaining from high-uncertainty reports, our approach improves factuality scores by $10$%, achieved by rejecting $20$% of reports using the exttt{Radialog} model on the MIMIC-CXR dataset. Furthermore, sentence-level uncertainty flags the lowest-precision sentence in each report with an $82.9$% success rate. Our implementation is open-source and available at https://github.com/BU-DEPEND-Lab/SCUQ-RRG.

Problem

Research questions and friction points this paper is trying to address.

Ensuring factual correctness in radiology report generation

Detecting hallucinations in generative medical Vision Large Language Models

Quantifying uncertainty at report and sentence levels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Consistency-Based Uncertainty Quantification framework

Plug-and-play module for existing models

Improves factuality by rejecting high-uncertainty reports

🔎 Similar Papers

Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation