MedVQA-TREE: A Multimodal Reasoning and Retrieval Framework for Sarcopenia Prediction

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Sarcopenia ultrasound diagnosis faces three key challenges: subtle imaging features, scarcity of annotated data, and lack of clinical context. To address these, we propose an interpretable diagnostic framework integrating multimodal reasoning and knowledge enhancement. First, a hierarchical visual understanding model is developed, combining anatomy-aware region segmentation with graph-structured spatial reasoning. Second, a gated feature fusion mechanism dynamically integrates imaging features with clinical semantic representations. Third, UMLS-guided multi-hop, multi-query retrieval jointly accesses PubMed and a domain-specific sarcopenia knowledge base to inject external clinical knowledge. Evaluated on both public and in-house datasets, our method achieves 99% diagnostic accuracy—surpassing state-of-the-art methods by over 10%—while significantly improving interpretability and clinical adaptability through transparent, knowledge-grounded decision pathways.

Technology Category

Application Category

📝 Abstract

Accurate sarcopenia diagnosis via ultrasound remains challenging due to subtle imaging cues, limited labeled data, and the absence of clinical context in most models. We propose MedVQA-TREE, a multimodal framework that integrates a hierarchical image interpretation module, a gated feature-level fusion mechanism, and a novel multi-hop, multi-query retrieval strategy. The vision module includes anatomical classification, region segmentation, and graph-based spatial reasoning to capture coarse, mid-level, and fine-grained structures. A gated fusion mechanism selectively integrates visual features with textual queries, while clinical knowledge is retrieved through a UMLS-guided pipeline accessing PubMed and a sarcopenia-specific external knowledge base. MedVQA-TREE was trained and evaluated on two public MedVQA datasets (VQA-RAD and PathVQA) and a custom sarcopenia ultrasound dataset. The model achieved up to 99% diagnostic accuracy and outperformed previous state-of-the-art methods by over 10%. These results underscore the benefit of combining structured visual understanding with guided knowledge retrieval for effective AI-assisted diagnosis in sarcopenia.

Problem

Research questions and friction points this paper is trying to address.

Accurate sarcopenia diagnosis via ultrasound remains challenging

Integrates hierarchical image interpretation with clinical context

Combines visual understanding with guided knowledge retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical image interpretation module

Gated feature-level fusion mechanism

Multi-hop multi-query retrieval strategy

🔎 Similar Papers

EMERGE: Enhancing Multimodal Electronic Health Records Predictive Modeling with Retrieval-Augmented Generation

2024-05-27International Conference on Information and Knowledge ManagementCitations: 4

Authors to Follow