🤖 AI Summary
Current chest X-ray (CXR) interpretation methods suffer from three key limitations: weak clinical interpretability, insufficient multimodal (vision–text) evidence fusion, and inconsistent tool outputs lacking dynamic validation mechanisms. To address these, we propose a clinical-prior-guided multi-agent framework that emulates radiologists’ diagnostic workflow, enabling vision-grounded collaborative reasoning. Specialized agents perform image analysis, report generation, consistency verification, and external knowledge retrieval; cross-modal evidence integration is achieved via multimodal fusion and vision–language alignment. We further introduce a dynamic conflict-resolution mechanism and retrieval-augmented contextual verification to enhance robustness and reliability. Experiments demonstrate significant improvements in diagnostic accuracy (+4.2%) and report consistency (+18.7%), yielding structured reports that are more transparent, clinically aligned, and compliant with established guidelines.
📝 Abstract
Agentic systems offer a potential path to solve complex clinical tasks through collaboration among specialized agents, augmented by tool use and external knowledge bases. Nevertheless, for chest X-ray (CXR) interpretation, prevailing methods remain limited: (i) reasoning is frequently neither clinically interpretable nor aligned with guidelines, reflecting mere aggregation of tool outputs; (ii) multimodal evidence is insufficiently fused, yielding text-only rationales that are not visually grounded; and (iii) systems rarely detect or resolve cross-tool inconsistencies and provide no principled verification mechanisms. To bridge the above gaps, we present RadAgents, a multi-agent framework for CXR interpretation that couples clinical priors with task-aware multimodal reasoning. In addition, we integrate grounding and multimodal retrieval-augmentation to verify and resolve context conflicts, resulting in outputs that are more reliable, transparent, and consistent with clinical practice.