CAG-VLM: Fine-Tuning of a Large-Scale Model to Recognize Angiographic Images for Next-Generation Diagnostic Systems

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the heavy clinical interpretation burden associated with coronary angiography (CAG) images, this study constructs the first physician-annotated bilingual CAG image–report parallel corpus and proposes a clinically trustworthy two-stage cascaded multimodal diagnostic framework. Methodologically, it integrates a ConvNeXt-Base visual encoder with PaliGemma2/Gemma3 large language models, incorporates ConceptCLIP for semantic alignment, and employs LoRA for efficient fine-tuning. It further introduces VLScore—a novel quantitative evaluation metric—and an expert blind-review protocol. Experiments demonstrate that CAG-VLM achieves a top-rated 7.20/10 in blind evaluation by cardiologists, attains an F1-score of 0.96 for left/right coronary artery discrimination, and significantly outperforms general-purpose vision-language models. Moreover, it consistently generates clinically compliant diagnostic conclusions and therapeutic recommendations, delivering reliable AI-assisted decision support for interventional catheterization laboratories.

Technology Category

Application Category

📝 Abstract
Coronary angiography (CAG) is the gold-standard imaging modality for evaluating coronary artery disease, but its interpretation and subsequent treatment planning rely heavily on expert cardiologists. To enable AI-based decision support, we introduce a two-stage, physician-curated pipeline and a bilingual (Japanese/English) CAG image-report dataset. First, we sample 14,686 frames from 539 exams and annotate them for key-frame detection and left/right laterality; a ConvNeXt-Base CNN trained on this data achieves 0.96 F1 on laterality classification, even on low-contrast frames. Second, we apply the CNN to 243 independent exams, extract 1,114 key frames, and pair each with its pre-procedure report and expert-validated diagnostic and treatment summary, yielding a parallel corpus. We then fine-tune three open-source VLMs (PaliGemma2, Gemma3, and ConceptCLIP-enhanced Gemma3) via LoRA and evaluate them using VLScore and cardiologist review. Although PaliGemma2 w/LoRA attains the highest VLScore, Gemma3 w/LoRA achieves the top clinician rating (mean 7.20/10); we designate this best-performing model as CAG-VLM. These results demonstrate that specialized, fine-tuned VLMs can effectively assist cardiologists in generating clinical reports and treatment recommendations from CAG images.
Problem

Research questions and friction points this paper is trying to address.

Develop AI to interpret coronary angiography images for diagnostics
Create bilingual dataset for training vision-language models
Fine-tune VLMs to assist cardiologists in clinical decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage physician-curated pipeline for CAG analysis
Fine-tuned VLMs with LoRA for image-report matching
Bilingual dataset enabling AI-based diagnostic support
🔎 Similar Papers
No similar papers found.
Y
Yuto Nakamura
The University of Tokyo, The University of Tokyo Hospital
S
S. Kodera
The University of Tokyo Hospital
H
Haruki Settai
The University of Tokyo, The University of Tokyo Hospital
H
H. Shinohara
The University of Tokyo Hospital
M
Masatsugu Tamura
The University of Tokyo Hospital
T
Tomohiro Noguchi
The University of Tokyo Hospital
T
Tatsuki Furusawa
The University of Tokyo Hospital
R
Ryo Takizawa
The University of Tokyo, The University of Tokyo Hospital
Tempei Kabayama
Tempei Kabayama
University of Tokyo
Machine LearningDynamical Systems
N
Norihiko Takeda
The University of Tokyo Hospital