π€ AI Summary
Distal myopathies exhibit heterogeneous clinical phenotypes and pose significant challenges for radiological diagnosis. To address this, we propose a multimodal attention-aware fusion architecture comprising dual-stream deep networks that extract multiscale features, integrated with an attention-gating mechanism to jointly optimize classification accuracy and model interpretability. The method generates clinically meaningful saliency maps, validated via mask consistency scoring, incremental occlusion analysis, and application-level evaluation by seven expert radiologists. Our model achieves high classification accuracy on both the BUSI benchmark and a newly curated distal myopathy dataset. While the saliency maps demonstrate preliminary clinical relevance, anatomical specificity remains limited. This work innovatively unifies context-aware interpretability design with human-in-the-loop feedback mechanisms, establishing a trustworthy, empirically verifiable paradigm for AI-assisted imaging diagnosis of hereditary myopathies.
π Abstract
Distal myopathy represents a genetically heterogeneous group of skeletal muscle disorders with broad clinical manifestations, posing diagnostic challenges in radiology. To address this, we propose a novel multimodal attention-aware fusion architecture that combines features extracted from two distinct deep learning models, one capturing global contextual information and the other focusing on local details, representing complementary aspects of the input data. Uniquely, our approach integrates these features through an attention gate mechanism, enhancing both predictive performance and interpretability. Our method achieves a high classification accuracy on the BUSI benchmark and a proprietary distal myopathy dataset, while also generating clinically relevant saliency maps that support transparent decision-making in medical diagnosis. We rigorously evaluated interpretability through (1) functionally grounded metrics, coherence scoring against reference masks and incremental deletion analysis, and (2) application-grounded validation with seven expert radiologists. While our fusion strategy boosts predictive performance relative to single-stream and alternative fusion strategies, both quantitative and qualitative evaluations reveal persistent gaps in anatomical specificity and clinical usefulness of the interpretability. These findings highlight the need for richer, context-aware interpretability methods and human-in-the-loop feedback to meet clinicians' expectations in real-world diagnostic settings.