Cross-Modal Clinical Knowledge Integration for Mammography Report Generation

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing approaches to automatic mammography report generation often overlook the structured clinical reasoning followed by radiologists, resulting in reports that lack accuracy and consistency. This work proposes MammoRG, a novel framework that explicitly integrates BI-RADS guidelines into the generation process to emulate the two-stage diagnostic logic used in clinical practice: first, incorporating prior knowledge through four-view image fusion with classification supervision, and second, performing term-aware fine-tuning using standardized terminology as atomic semantic units. To enable fine-grained evaluation, we develop MammoRGTool, a dedicated report parsing tool. Experimental results demonstrate that MammoRG significantly outperforms current state-of-the-art models across multiple datasets, achieving up to a 3.27% improvement in BI-RADS–related F1 scores.

📝 Abstract

Breast cancer is a major global health concern, and mammography screening plays a central role in early detection. The large volume of screening examinations creates a substantial workload for radiologists, making accurate and consistent report generation a critical clinical challenge. Existing automated mammography report generation methods primarily focus on direct visual-to-text mapping, while overlooking the structured clinical reasoning process followed by radiologists in real-world practice. To address this limitation, we propose MammoRG, a mammography report generation framework that explicitly simulates the clinical reporting workflow by following the BI-RADS guideline and incorporating prior clinical knowledge to produce diagnostic reports. Specifically, MammoRG adopts a two-stage training framework. In the first stage, the model learns to integrate clinically relevant prior knowledge from a patient's four-view mammograms through classification-based supervision. In the second stage, a terminology-aware supervised fine-tuning strategy is introduced to model mammography-specific clinical terms as atomic semantic units, enabling the generation of high-quality reports with improved clinical consistency. To facilitate clinical efficacy evaluation of generated reports, we further develop MammoRGTool, a dedicated mammography report parsing tool that extracts structured clinical information from free-text reports. Extensive experiments demonstrate that MammoRG consistently outperforms existing methods across multiple clinical efficacy metrics, particularly in diagnosis-related BI-RADS F1, where it surpasses the second-best model by 2.73%, 2.04%, 1.90%, and 3.27% on the internal, external 1, external 2, and VinDr-Mammo datasets, respectively.

Problem

Research questions and friction points this paper is trying to address.

mammography report generation

clinical reasoning

cross-modal integration

BI-RADS guideline

clinical consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal integration

BI-RADS-guided reasoning

terminology-aware fine-tuning