CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Multimodal large language models (MLLMs) frequently generate clinically implausible content—termed “clinical hallucinations”—in radiology report generation, undermining diagnostic reliability. To address this, we propose a **training-free, retrieval-free clinical contrastive decoding framework**, the first to introduce a two-stage token-level logits optimization mechanism. Without altering the original MLLM’s parameters, our method dynamically integrates structured clinical knowledge from task-specific expert models (e.g., the RadGraph parser) to enable real-time generation calibration. By performing fine-grained semantic alignment between expert-derived signals and model-generated tokens, it effectively suppresses hallucinations at the token level. Extensive evaluation across three benchmark datasets—including MIMIC-CXR—demonstrates substantial improvements: RadGraph-F1 scores increase by up to 17% over state-of-the-art methods, significantly enhancing both clinical accuracy and semantic coherence of generated reports.

Technology Category

Application Category

📝 Abstract

Multimodal large language models (MLLMs) have recently achieved remarkable progress in radiology by integrating visual perception with natural language understanding. However, they often generate clinically unsupported descriptions, known as medical hallucinations, which pose serious risks in medical applications that demand accuracy and image-grounded outputs. Through empirical analysis, we find that prompt-induced hallucinations remain prevalent in radiology MLLMs, largely due to over-sensitivity to clinical sections. To address this, we introduce Clinical Contrastive Cecoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task-specific radiology expert models. CCD introduces a dual-stage contrastive mechanism to refine token-level logits during generation, thereby enhancing clinical fidelity without modifying the base MLLM. Experiments on three datasets and multiple models demonstrate that CCD consistently improves overall performance on radiology report generation (RRG). On the MIMIC-CXR dataset, it yields up to a 17% improvement in RadGraph-F1 when applied to state-of-the-art RRG models. Our approach provides a lightweight and generalisable solution for mitigating medical hallucinations, effectively bridging expert models and MLLMs in radiology.

Problem

Research questions and friction points this paper is trying to address.

Reducing hallucinations in radiology MLLMs through contrastive decoding

Addressing clinically unsupported descriptions in medical imaging AI

Improving clinical fidelity without modifying base multimodal models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free inference framework with clinical contrastive decoding

Dual-stage contrastive mechanism refines token-level logits

Integrates structured clinical signals from radiology expert models

🔎 Similar Papers

CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation