Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited clinical reasoning capability and poor interpretability of general-purpose vision-language models (VLMs) in radiological diagnosis. We propose a weakly supervised paradigm that requires no annotated image-lesion pairs, leveraging only free-text radiology reports. Our method automatically parses unstructured reports into structured, stepwise Chain-of-Thought (CoT) reasoning paths, then integrates contrastive image-report alignment with multi-granularity clinical reward-guided reinforcement fine-tuning. To our knowledge, this is the first framework to distill stepwise diagnostic supervision signals—aligned with radiologists’ cognitive reasoning—from raw text reports alone. Zero-shot evaluation on MIMIC-CXR demonstrates substantial improvements: +0.24 in disease classification AUC, +0.23 in lesion localization mIoU, and +0.22 in report generation BLEU score, outperforming state-of-the-art methods. The approach establishes a novel, interpretable, and scalable paradigm for training medical VLMs.

Technology Category

Application Category

📝 Abstract
This study presents DiagCoT, a multi-stage framework that applies supervised fine-tuning to general-purpose vision-language models (VLMs) to emulate radiologists' stepwise diagnostic reasoning using only free-text reports. DiagCoT combines contrastive image-report tuning for domain alignment, chain-of-thought supervision to capture inferential logic, and reinforcement tuning with clinical reward signals to enhance factual accuracy and fluency. On the MIMIC-CXR benchmark, DiagCoT improved zero-shot disease classification AUC from 0.52 to 0.76 (absolute gain of 0.24), pathology grounding mIoU from 0.08 to 0.31 (absolute gain of 0.23), and report generation BLEU from 0.11 to 0.33 (absolute gain of 0.22). It outperformed state-of-the-art models including LLaVA-Med and CXR-LLAVA on long-tailed diseases and external datasets. By converting unstructured clinical narratives into structured supervision, DiagCoT offers a scalable approach for developing interpretable and diagnostically competent AI systems for radiology.
Problem

Research questions and friction points this paper is trying to address.

Teaching AI diagnostic reasoning using radiology reports
Improving disease classification and pathology localization accuracy
Enhancing report generation quality for clinical applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised fine-tuning of vision-language models
Contrastive image-report tuning for domain alignment
Reinforcement tuning with clinical reward signals
🔎 Similar Papers
No similar papers found.
Yihong Luo
Yihong Luo
The Hong Kong University of Science and Technology
Generative ModelsDiffusion ModelsEnergy-Based ModelsGraph Neural Network
W
Wenwu He
Fujian University of Technology, Fuzhou, China; Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fuzhou, China
Zhuo-Xu Cui
Zhuo-Xu Cui
Associate Professor, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
MRIInverse ProblemsDeep LearningGenerative Models
D
Dong Liang
Shenzhen Institute of Advanced Technology Chinese Academy of Sciences, Shenzhen, China; Key Laboratory of Biomedical Imaging Science and System, Chinese Academy of Sciences, Shenzhen, China