🤖 AI Summary
Sarcopenia ultrasound diagnosis faces three key challenges: subtle imaging features, scarcity of annotated data, and lack of clinical context. To address these, we propose an interpretable diagnostic framework integrating multimodal reasoning and knowledge enhancement. First, a hierarchical visual understanding model is developed, combining anatomy-aware region segmentation with graph-structured spatial reasoning. Second, a gated feature fusion mechanism dynamically integrates imaging features with clinical semantic representations. Third, UMLS-guided multi-hop, multi-query retrieval jointly accesses PubMed and a domain-specific sarcopenia knowledge base to inject external clinical knowledge. Evaluated on both public and in-house datasets, our method achieves 99% diagnostic accuracy—surpassing state-of-the-art methods by over 10%—while significantly improving interpretability and clinical adaptability through transparent, knowledge-grounded decision pathways.
📝 Abstract
Accurate sarcopenia diagnosis via ultrasound remains challenging due to subtle imaging cues, limited labeled data, and the absence of clinical context in most models. We propose MedVQA-TREE, a multimodal framework that integrates a hierarchical image interpretation module, a gated feature-level fusion mechanism, and a novel multi-hop, multi-query retrieval strategy. The vision module includes anatomical classification, region segmentation, and graph-based spatial reasoning to capture coarse, mid-level, and fine-grained structures. A gated fusion mechanism selectively integrates visual features with textual queries, while clinical knowledge is retrieved through a UMLS-guided pipeline accessing PubMed and a sarcopenia-specific external knowledge base. MedVQA-TREE was trained and evaluated on two public MedVQA datasets (VQA-RAD and PathVQA) and a custom sarcopenia ultrasound dataset. The model achieved up to 99% diagnostic accuracy and outperformed previous state-of-the-art methods by over 10%. These results underscore the benefit of combining structured visual understanding with guided knowledge retrieval for effective AI-assisted diagnosis in sarcopenia.