🤖 AI Summary
To address the challenges of low accuracy and poor robustness in skin lesion classification—particularly for lesions with ambiguous boundaries and complex morphological structures—this paper proposes a multi-scale Transformer model. The method integrates local fine-grained features and global contextual information via a cross-scale feature fusion module and enhances modeling of subtle pathological patterns through an improved self-attention mechanism. Grad-CAM is incorporated to improve decision interpretability. Evaluated on the ISIC 2017 dataset, the model achieves state-of-the-art performance, surpassing ResNet50, VGG19, ResNeXt, and standard ViT across accuracy, AUC, F1-score, and precision. Key contributions include: (1) a lesion-aware multi-scale feature fusion architecture; (2) a lightweight, medical-image-adapted self-attention optimization strategy; and (3) an end-to-end classification framework that jointly delivers high performance and clinical interpretability.
📝 Abstract
This study introduces an AI-driven skin lesion classification algorithm built on an enhanced Transformer architecture, addressing the challenges of accuracy and robustness in medical image analysis. By integrating a multi-scale feature fusion mechanism and refining the self-attention process, the model effectively extracts both global and local features, enhancing its ability to detect lesions with ambiguous boundaries and intricate structures. Performance evaluation on the ISIC 2017 dataset demonstrates that the improved Transformer surpasses established AI models, including ResNet50, VGG19, ResNext, and Vision Transformer, across key metrics such as accuracy, AUC, F1-Score, and Precision. Grad-CAM visualizations further highlight the interpretability of the model, showcasing strong alignment between the algorithm's focus areas and actual lesion sites. This research underscores the transformative potential of advanced AI models in medical imaging, paving the way for more accurate and reliable diagnostic tools. Future work will explore the scalability of this approach to broader medical imaging tasks and investigate the integration of multimodal data to enhance AI-driven diagnostic frameworks for intelligent healthcare.