🤖 AI Summary
To address the heavy clinical interpretation burden associated with coronary angiography (CAG) images, this study constructs the first physician-annotated bilingual CAG image–report parallel corpus and proposes a clinically trustworthy two-stage cascaded multimodal diagnostic framework. Methodologically, it integrates a ConvNeXt-Base visual encoder with PaliGemma2/Gemma3 large language models, incorporates ConceptCLIP for semantic alignment, and employs LoRA for efficient fine-tuning. It further introduces VLScore—a novel quantitative evaluation metric—and an expert blind-review protocol. Experiments demonstrate that CAG-VLM achieves a top-rated 7.20/10 in blind evaluation by cardiologists, attains an F1-score of 0.96 for left/right coronary artery discrimination, and significantly outperforms general-purpose vision-language models. Moreover, it consistently generates clinically compliant diagnostic conclusions and therapeutic recommendations, delivering reliable AI-assisted decision support for interventional catheterization laboratories.
📝 Abstract
Coronary angiography (CAG) is the gold-standard imaging modality for evaluating coronary artery disease, but its interpretation and subsequent treatment planning rely heavily on expert cardiologists. To enable AI-based decision support, we introduce a two-stage, physician-curated pipeline and a bilingual (Japanese/English) CAG image-report dataset. First, we sample 14,686 frames from 539 exams and annotate them for key-frame detection and left/right laterality; a ConvNeXt-Base CNN trained on this data achieves 0.96 F1 on laterality classification, even on low-contrast frames. Second, we apply the CNN to 243 independent exams, extract 1,114 key frames, and pair each with its pre-procedure report and expert-validated diagnostic and treatment summary, yielding a parallel corpus. We then fine-tune three open-source VLMs (PaliGemma2, Gemma3, and ConceptCLIP-enhanced Gemma3) via LoRA and evaluate them using VLScore and cardiologist review. Although PaliGemma2 w/LoRA attains the highest VLScore, Gemma3 w/LoRA achieves the top clinician rating (mean 7.20/10); we designate this best-performing model as CAG-VLM. These results demonstrate that specialized, fine-tuned VLMs can effectively assist cardiologists in generating clinical reports and treatment recommendations from CAG images.