🤖 AI Summary
To address the clinical challenge of determining appropriate specialist referrals for home-based patients with chronic wounds, this study proposes the Deep Multimodal Wound Assessment Tool (DM-WAT), enabling visiting nurses to make evidence-informed referral decisions using smartphone-captured wound images and clinical text notes. Methodologically, DM-WAT introduces a novel vision–language intermediate fusion architecture integrating DeiT-Base-Distilled Vision Transformer and DeBERTa-base models; incorporates multimodal data augmentation and transfer learning to mitigate small-sample size and class imbalance; and unifies Score-CAM and Captum for cross-modal interpretability. Experimental evaluation demonstrates that DM-WAT achieves 77% ± 3% accuracy and 70% ± 2% F1-score—significantly outperforming baseline methods—while generating clinically intelligible, attention-based decision rationales. This work advances intelligent, evidence-based management of chronic wounds in home care settings.
📝 Abstract
Chronic wounds affect 8.5 million Americans, particularly the elderly and patients with diabetes. These wounds can take up to nine months to heal, making regular care essential to ensure healing and prevent severe outcomes like limb amputations. Many patients receive care at home from visiting nurses with varying levels of wound expertise, leading to inconsistent care. Problematic, non-healing wounds should be referred to wound specialists, but referral decisions in non-clinical settings are often erroneous, delayed, or unnecessary. This paper introduces the Deep Multimodal Wound Assessment Tool (DM-WAT), a machine learning framework designed to assist visiting nurses in deciding whether to refer chronic wound patients. DM-WAT analyzes smartphone-captured wound images and clinical notes from Electronic Health Records (EHRs). It uses DeiT-Base-Distilled, a Vision Transformer (ViT), to extract visual features from images and DeBERTa-base to extract text features from clinical notes. DM-WAT combines visual and text features using an intermediate fusion approach. To address challenges posed by a small and imbalanced dataset, it integrates image and text augmentation with transfer learning to achieve high performance. In evaluations, DM-WAT achieved 77% with std 3% accuracy and a 70% with std 2% F1 score, outperforming prior approaches. Score-CAM and Captum interpretation algorithms provide insights into specific parts of image and text inputs that influence recommendations, enhancing interpretability and trust.