Evaluating Automated Radiology Report Quality Through Fine-Grained Phrasal Grounding of Clinical Findings

📅 2024-12-02

🏛️ IEEE International Symposium on Biomedical Imaging

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the misalignment between clinical findings—such as location, laterality, and severity—and corresponding anatomical regions in chest X-ray images during generative AI report evaluation. To resolve this, we propose the first anatomy-grounded, multimodal report quality assessment method. Our approach integrates clinical named entity recognition, fine-grained relation extraction, cross-modal phrase-to-image grounding, and multi-source consistency scoring to achieve phrase-level localization of textual descriptions onto anatomical regions in chest radiographs and joint validation. Compared to conventional text-only metrics (e.g., BLEU, BERTScore), our method demonstrates significantly higher correlation with radiologist expert ratings on a standard ground-truth dataset (p < 0.01). It overcomes key limitations of traditional evaluation paradigms by enabling interpretable, anatomy-aware, and empirically verifiable assessment—establishing a novel benchmark for clinically trustworthy AI-assisted diagnostic reporting.

Technology Category

Application Category

📝 Abstract

Several evaluation metrics have been developed recently to automatically assess the quality of generative AI reports for chest radiographs based only on textual information using lexical, semantic, or clinical named entity recognition methods. In this paper, we develop a new method of report quality evaluation by first extracting fine-grained finding patterns capturing the location, laterality, and severity of a large number of clinical findings. We then performed phrasal grounding to localize their associated anatomical regions on chest radiograph images. The textual and visual measures are then combined to rate the quality of the generated reports. We present results that compare this evaluation metric with other textual metrics on gold standard datasets.

Problem

Research questions and friction points this paper is trying to address.

Automated evaluation of radiology report quality

Fine-grained clinical finding pattern extraction

Combining textual and visual measures for accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts fine-grained clinical finding patterns

Performs phrasal grounding for anatomical localization

Combines textual and visual measures for evaluation

🔎 Similar Papers

No similar papers found.

Authors to Follow