A Vision-language Framework for Comparative Reasoning in Radiology

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study addresses the limitation of current medical imaging AI systems, which excel at single-image interpretation but lack the capacity for cross-temporal or cross-case comparative reasoning essential in radiological practice. The authors formulate radiological comparison as an entity-aware cross-image reasoning task and introduce MedReCo-DB, a large-scale contrastive imaging dataset. They propose the MedReCo visual encoder for controllable analogy-based retrieval and the MedReCo-VLM vision-language model for generative interpretation of temporal changes. Leveraging only routine clinical image-text data, this work is the first to learn entity-aware comparative reasoning capabilities, significantly advancing performance on clinically relevant tasks: MedReCo achieves state-of-the-art Recall@1 on 12 internal retrieval benchmarks and improves external retrieval by 6.0 percentage points on average; MedReCo-VLM boosts longitudinal interpretation accuracy on chest X-rays and CT scans by 14.5–46.5 and 13.0–27.9 percentage points, respectively.

📝 Abstract

Medical imaging artificial intelligence has achieved strong performance in isolated image interpretation, but remains poorly aligned with radiological practice, where diagnosis and follow-up rely on comparison across prior studies and analogous reference cases. Here we formulate radiological comparison as an entity-aware cross-image reasoning problem and introduce a framework that supports both reference-case retrieval and temporal comparative interpretation. We construct MedReCo-DB, a large-scale comparative imaging resource derived from routine image-report pairs, comprising more than 690,000 images from over 160,000 patients across eight institutions, four countries and seven imaging modalities. Reports are decomposed into anatomical structures, abnormal findings and pathological conditions to provide supervision for entity-conditioned retrieval and comparative visual question answering. Using this resource, we develop MedReCo, an entity-aware visual encoder for controllable retrieval of clinically analogous cases, and MedReCo-VLM, a vision--language extension for generative interpretation of interval change. Across internal, external and cross-center evaluations, MedReCo achieved the highest Recall@1 in all 12 internal retrieval settings and improved external retrieval by a mean of 6.0 percentage points. In clinically confusable differential groups, it consistently outperformed the strongest baselines. MedReCo-VLM achieved the best performance across all comparative generation evaluations and improved longitudinal follow-up accuracy by 14.5-46.5 percentage points on chest radiographs and 13.0-27.9 percentage points on CT. These findings suggest that entity-aware comparative reasoning can be learned from routine clinical data at scale and may provide a more clinically aligned foundation for medical imaging AI.

Problem

Research questions and friction points this paper is trying to address.

comparative reasoning

radiology

medical imaging AI

cross-image analysis

clinical alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

entity-aware reasoning

comparative medical imaging

vision-language model