Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the limited clinical reasoning capability and poor interpretability of general-purpose vision-language models (VLMs) in radiological diagnosis. We propose a weakly supervised paradigm that requires no annotated image-lesion pairs, leveraging only free-text radiology reports. Our method automatically parses unstructured reports into structured, stepwise Chain-of-Thought (CoT) reasoning paths, then integrates contrastive image-report alignment with multi-granularity clinical reward-guided reinforcement fine-tuning. To our knowledge, this is the first framework to distill stepwise diagnostic supervision signals—aligned with radiologists’ cognitive reasoning—from raw text reports alone. Zero-shot evaluation on MIMIC-CXR demonstrates substantial improvements: +0.24 in disease classification AUC, +0.23 in lesion localization mIoU, and +0.22 in report generation BLEU score, outperforming state-of-the-art methods. The approach establishes a novel, interpretable, and scalable paradigm for training medical VLMs.

Technology Category

Application Category

📝 Abstract

This study presents DiagCoT, a multi-stage framework that applies supervised fine-tuning to general-purpose vision-language models (VLMs) to emulate radiologists' stepwise diagnostic reasoning using only free-text reports. DiagCoT combines contrastive image-report tuning for domain alignment, chain-of-thought supervision to capture inferential logic, and reinforcement tuning with clinical reward signals to enhance factual accuracy and fluency. On the MIMIC-CXR benchmark, DiagCoT improved zero-shot disease classification AUC from 0.52 to 0.76 (absolute gain of 0.24), pathology grounding mIoU from 0.08 to 0.31 (absolute gain of 0.23), and report generation BLEU from 0.11 to 0.33 (absolute gain of 0.22). It outperformed state-of-the-art models including LLaVA-Med and CXR-LLAVA on long-tailed diseases and external datasets. By converting unstructured clinical narratives into structured supervision, DiagCoT offers a scalable approach for developing interpretable and diagnostically competent AI systems for radiology.

Problem

Research questions and friction points this paper is trying to address.

Teaching AI diagnostic reasoning using radiology reports

Improving disease classification and pathology localization accuracy

Enhancing report generation quality for clinical applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised fine-tuning of vision-language models

Contrastive image-report tuning for domain alignment

Reinforcement tuning with clinical reward signals

🔎 Similar Papers

CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation