CAG-VLM: Fine-Tuning of a Large-Scale Model to Recognize Angiographic Images for Next-Generation Diagnostic Systems

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the heavy clinical interpretation burden associated with coronary angiography (CAG) images, this study constructs the first physician-annotated bilingual CAG image–report parallel corpus and proposes a clinically trustworthy two-stage cascaded multimodal diagnostic framework. Methodologically, it integrates a ConvNeXt-Base visual encoder with PaliGemma2/Gemma3 large language models, incorporates ConceptCLIP for semantic alignment, and employs LoRA for efficient fine-tuning. It further introduces VLScore—a novel quantitative evaluation metric—and an expert blind-review protocol. Experiments demonstrate that CAG-VLM achieves a top-rated 7.20/10 in blind evaluation by cardiologists, attains an F1-score of 0.96 for left/right coronary artery discrimination, and significantly outperforms general-purpose vision-language models. Moreover, it consistently generates clinically compliant diagnostic conclusions and therapeutic recommendations, delivering reliable AI-assisted decision support for interventional catheterization laboratories.

Technology Category

Application Category

📝 Abstract

Coronary angiography (CAG) is the gold-standard imaging modality for evaluating coronary artery disease, but its interpretation and subsequent treatment planning rely heavily on expert cardiologists. To enable AI-based decision support, we introduce a two-stage, physician-curated pipeline and a bilingual (Japanese/English) CAG image-report dataset. First, we sample 14,686 frames from 539 exams and annotate them for key-frame detection and left/right laterality; a ConvNeXt-Base CNN trained on this data achieves 0.96 F1 on laterality classification, even on low-contrast frames. Second, we apply the CNN to 243 independent exams, extract 1,114 key frames, and pair each with its pre-procedure report and expert-validated diagnostic and treatment summary, yielding a parallel corpus. We then fine-tune three open-source VLMs (PaliGemma2, Gemma3, and ConceptCLIP-enhanced Gemma3) via LoRA and evaluate them using VLScore and cardiologist review. Although PaliGemma2 w/LoRA attains the highest VLScore, Gemma3 w/LoRA achieves the top clinician rating (mean 7.20/10); we designate this best-performing model as CAG-VLM. These results demonstrate that specialized, fine-tuned VLMs can effectively assist cardiologists in generating clinical reports and treatment recommendations from CAG images.

Problem

Research questions and friction points this paper is trying to address.

Develop AI to interpret coronary angiography images for diagnostics

Create bilingual dataset for training vision-language models

Fine-tune VLMs to assist cardiologists in clinical decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage physician-curated pipeline for CAG analysis

Fine-tuned VLMs with LoRA for image-report matching

Bilingual dataset enabling AI-based diagnostic support

🔎 Similar Papers

No similar papers found.

Authors to Follow