Spatial-aware Transformer-GRU Framework for Enhanced Glaucoma Diagnosis from 3D OCT Imaging

📅 2024-03-08
🏛️ arXiv.org
📈 Citations: 4
Influential: 1
📄 PDF
🤖 AI Summary
To address insufficient spatial structural modeling of 3D optical coherence tomography (OCT) volumes in early glaucoma detection, this paper proposes a spatially aware ViT-BiGRU collaborative framework. Specifically, a fine-tuned Vision Transformer (ViT) extracts intra-slice local semantic features, while a bidirectional gated recurrent unit (Bi-GRU) explicitly captures inter-slice spatial topological dependencies—enabling, for the first time, joint modeling of intra-layer anatomical details and inter-layer volumetric structure. This end-to-end approach overcomes the depth-agnostic limitation of conventional 2D models. Evaluated on a large-scale clinical dataset, the method achieves an F1-score of 93.58%, Matthews Correlation Coefficient (MCC) of 73.54%, and AUC of 95.24%, significantly outperforming existing state-of-the-art methods. The framework delivers a novel, interpretable, and robust paradigm for accurate early screening of glaucoma.

Technology Category

Application Category

📝 Abstract
Glaucoma, a leading cause of irreversible blindness, necessitates early detection for accurate and timely intervention to prevent irreversible vision loss. In this study, we present a novel deep learning framework that leverages the diagnostic value of 3D Optical Coherence Tomography (OCT) imaging for automated glaucoma detection. In this framework, we integrate a pre-trained Vision Transformer on retinal data for rich slice-wise feature extraction and a bidirectional Gated Recurrent Unit for capturing inter-slice spatial dependencies. This dual-component approach enables comprehensive analysis of local nuances and global structural integrity, crucial for accurate glaucoma diagnosis. Experimental results on a large dataset demonstrate the superior performance of the proposed method over state-of-the-art ones, achieving an F1-score of 93.58%, Matthews Correlation Coefficient (MCC) of 73.54%, and AUC of 95.24%. The framework's ability to leverage the valuable information in 3D OCT data holds significant potential for enhancing clinical decision support systems and improving patient outcomes in glaucoma management.
Problem

Research questions and friction points this paper is trying to address.

Automated glaucoma detection from 3D OCT imaging
Integrating Vision Transformer and GRU for spatial dependencies
Enhancing clinical decision support for glaucoma management
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Transformer for slice-wise feature extraction
Bidirectional GRU captures inter-slice spatial dependencies
Dual-component analysis of local and global structures
🔎 Similar Papers
No similar papers found.
Mona Ashtari-Majlan
Mona Ashtari-Majlan
Department of Computer Science, Multimedia, and Telecommunications, Universitat Oberta de Catalunya, Barcelona, Spain
M
Mohammad Mahdi Dehshibi
Department of Computer Science and Engineering, Universitat Carlos III de Madrid, Madrid, Spain
D
David Masip
Department of Computer Science, Multimedia, and Telecommunications, Universitat Oberta de Catalunya, Barcelona, Spain