EIR: Enhanced Image Representations for Medical Report Generation

๐Ÿ“… 2025-12-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address information asymmetry arising from distributional misalignment between visual features and clinical metadata, as well as domain shift when applying natural-image pre-trained models to medical imaging, this paper proposes a cross-modal collaborative modeling framework. First, a medical-domain-specific vision model (e.g., RadImageNet) is employed to extract discriminative X-ray image representations. Second, a cross-modal Transformer architecture is designed to deeply fuse image features with multi-source structured metadata (e.g., age, sex, prior diagnoses). Third, an end-to-end report generation module ensures semantic alignment between visualโ€“metadata inputs and radiology text outputs. Evaluated on MIMIC-CXR and Open-I, the method achieves significant improvements in BLEU-4, CIDEr, and METEOR scores. Blinded evaluations by board-certified radiologists confirm superior clinical relevance and diagnostic accuracy over state-of-the-art approaches. This work is the first to integrate medical pre-trained visual encoders with metadata-aware cross-modal fusion for automated radiology report generation, effectively mitigating both information asymmetry and domain gap challenges.

Technology Category

Application Category

๐Ÿ“ Abstract
Generating medical reports from chest X-ray images is a critical and time-consuming task for radiologists, especially in emergencies. To alleviate the stress on radiologists and reduce the risk of misdiagnosis, numerous research efforts have been dedicated to automatic medical report generation in recent years. Most recent studies have developed methods that represent images by utilizing various medical metadata, such as the clinical document history of the current patient and the medical graphs constructed from retrieved reports of other similar patients. However, all existing methods integrate additional metadata representations with visual representations through a simple "Add and LayerNorm" operation, which suffers from the information asymmetry problem due to the distinct distributions between them. In addition, chest X-ray images are usually represented using pre-trained models based on natural domain images, which exhibit an obvious domain gap between general and medical domain images. To this end, we propose a novel approach called Enhanced Image Representations (EIR) for generating accurate chest X-ray reports. We utilize cross-modal transformers to fuse metadata representations with image representations, thereby effectively addressing the information asymmetry problem between them, and we leverage medical domain pre-trained models to encode medical images, effectively bridging the domain gap for image representation. Experimental results on the widely used MIMIC and Open-I datasets demonstrate the effectiveness of our proposed method.
Problem

Research questions and friction points this paper is trying to address.

Fuses metadata with images to address information asymmetry
Bridges domain gap using medical pre-trained models for images
Generates accurate chest X-ray reports to aid radiologists
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal transformers fuse metadata with image representations
Medical domain pre-trained models encode chest X-ray images
Enhanced Image Representations address information asymmetry and domain gap
๐Ÿ”Ž Similar Papers
No similar papers found.
Q
Qiang Sun
Institute of Advanced Technology, University of Science and Technology of China, Anhui, Hefei 230027, China
Z
Zongcheng Ji
PAII Inc., California 94087, America
Y
Yinlong Xiao
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Peng Chang
Peng Chang
PAII Inc.
Computer VisionRobotics
J
Jun Yu
Department of Automation and the Institute of Advanced Technology, University of Science and Technology of China, Anhui, Hefei 230027, China