🤖 AI Summary
This study addresses non-invasive facial辅助 diagnosis of Cushing’s syndrome (CS), characterized by distinctive phenotypic features such as moon facies and plethoric appearance. We systematically evaluate four classes of pre-trained models—ResNet/EfficientNet (CNNs), Vision Transformers (ViT), Swin Transformers, and the self-supervised vision foundation model DINOv2—for discriminative performance on CS facial recognition. Notably, this is the first work to introduce DINOv2 to CS facial analysis; under parameter-freezing transfer learning, DINOv2 achieves superior overall accuracy and stability compared to CNNs, with markedly enhanced performance on female subjects—highlighting a clinically relevant gender bias. ViT attains the highest F1-score (85.74%), while DINOv2 demonstrates the critical value of global contextual modeling for complex clinical phenotypes. Results underscore that judicious architecture selection and transfer learning strategies significantly improve the robustness and fairness of AI-assisted diagnosis in endocrinology.
📝 Abstract
Cushing's syndrome is a condition caused by excessive glucocorticoid secretion from the adrenal cortex, often manifesting with moon facies and plethora, making facial data crucial for diagnosis. Previous studies have used pre-trained convolutional neural networks (CNNs) for diagnosing Cushing's syndrome using frontal facial images. However, CNNs are better at capturing local features, while Cushing's syndrome often presents with global facial features. Transformer-based models like ViT and SWIN, which utilize self-attention mechanisms, can better capture long-range dependencies and global features. Recently, DINOv2, a foundation model based on visual Transformers, has gained interest. This study compares the performance of various pre-trained models, including CNNs, Transformer-based models, and DINOv2, in diagnosing Cushing's syndrome. We also analyze gender bias and the impact of freezing mechanisms on DINOv2. Our results show that Transformer-based models and DINOv2 outperformed CNNs, with ViT achieving the highest F1 score of 85.74%. Both the pre-trained model and DINOv2 had higher accuracy for female samples. DINOv2 also showed improved performance when freezing parameters. In conclusion, Transformer-based models and DINOv2 are effective for Cushing's syndrome classification.