RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

📅 2023-11-30

🏛️ arXiv.org

📈 Citations: 24

✨ Influential: 4

career value

181K/year

🤖 AI Summary

This study addresses the need for clinically reliable vision-language dialogue systems for chest X-ray interpretation to support radiologists in efficiently generating and iteratively refining diagnostic reports. Methodologically, we (1) introduce an image-anchored, semi-automatic paradigm for constructing radiology-specific instruction data; (2) propose a ViT-text structured multimodal fusion architecture that explicitly models hierarchical anatomical and pathological semantics; and (3) integrate LoRA-based parameter-efficient fine-tuning with image-text alignment-aware instruction tuning. Our contributions include: (i) releasing the first publicly available, comprehensively evaluated medical vision-language model (VLM) benchmark; (ii) achieving state-of-the-art clinical correctness in automated report generation; (iii) significantly outperforming existing methods on interactive tasks—including report revision and medical question answering; and (iv) open-sourcing all code and data to foster reproducible research and clinical translation.

📝 Abstract

Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems. Our code is available on github: https://github.com/ChantalMP/RaDialog.

Problem

Research questions and friction points this paper is trying to address.

Generating clinically accurate radiology reports from medical images

Enabling interactive dialog for collaborative radiology diagnostics

Adapting vision-language models for specialized medical domain tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates visual features with LLM

Uses parameter-efficient fine-tuning

Semi-automated labeled dataset for training

🔎 Similar Papers

RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training