🤖 AI Summary
Retinal disease detection in fundus images faces challenges including substantial variability in imaging quality, subtle early-stage lesions, and cross-dataset domain shift. Method: We propose a dual-path framework integrating a Vision Transformer (ViT) classifier with a GANomaly-based anomaly detector. To enhance clinical interpretability, we introduce functional localization constraints; for threshold-free clinical decision support, we employ GUESS-based probability calibration. The method further incorporates geometric/color augmentation, histogram equalization, and multi-dataset joint transfer learning to improve generalizability. Contribution/Results: On multiple public datasets, the ViT achieves accuracy of 0.789–0.843 and an AUC of 0.91 on the Papila dataset—significantly outperforming CNN baselines. The anomaly detector attains an AUC of 0.76, demonstrating both reconstruction-based interpretability and robustness against domain shifts.
📝 Abstract
Reliable detection of retinal diseases from fundus images is challenged by the variability in imaging quality, subtle early-stage manifestations, and domain shift across datasets. In this study, we systematically evaluated a Vision Transformer (ViT) classifier under multiple augmentation and enhancement strategies across several heterogeneous public datasets, as well as the AEyeDB dataset, a high-quality fundus dataset created in-house and made available for the research community. The ViT demonstrated consistently strong performance, with accuracies ranging from 0.789 to 0.843 across datasets and diseases. Diabetic retinopathy and age-related macular degeneration were detected reliably, whereas glaucoma remained the most frequently misclassified disease. Geometric and color augmentations provided the most stable improvements, while histogram equalization benefited datasets dominated by structural subtlety. Laplacian enhancement reduced performance across different settings.
On the Papila dataset, the ViT with geometric augmentation achieved an AUC of 0.91, outperforming previously reported convolutional ensemble baselines (AUC of 0.87), underscoring the advantages of transformer architectures and multi-dataset training. To complement the classifier, we developed a GANomaly-based anomaly detector, achieving an AUC of 0.76 while providing inherent reconstruction-based explainability and robust generalization to unseen data. Probabilistic calibration using GUESS enabled threshold-independent decision support for future clinical implementation.