Balancing Accuracy and Efficiency: CNN Fusion Models for Diabetic Retinopathy Screening

📅 2025-12-25

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the challenge of balancing accuracy and efficiency in global diabetic retinopathy (DR) screening—exacerbated by heterogeneous fundus image quality and scarcity of ophthalmologic expertise—this paper proposes a lightweight cross-device feature-fusion CNN. We systematically evaluate the generalization benefits of feature-level fusion between EfficientNet-B0 and DenseNet121, extracting features via pretrained ResNet50, EfficientNet-B0, and DenseNet121; features are concatenated and jointly trained across five diverse datasets, with rigorous five-fold independent evaluation. Our proposed Eff+Den model achieves 82.89% overall accuracy on multi-source, heterogeneous data (F1-scores: 83.60% for normal, 82.60% for pathological cases), with only 1.42 ms inference latency per image. It significantly outperforms individual models and three-model fusion baselines, achieving a favorable trade-off among high accuracy, strong cross-dataset generalization, and ultra-low latency—enabling deployable, resource-efficient DR triage in low-resource settings.

Technology Category

Application Category

📝 Abstract

Diabetic retinopathy (DR) remains a leading cause of preventable blindness, yet large-scale screening is constrained by limited specialist availability and variable image quality across devices and populations. This work investigates whether feature-level fusion of complementary convolutional neural network (CNN) backbones can deliver accurate and efficient binary DR screening on globally sourced fundus images. Using 11,156 images pooled from five public datasets (APTOS, EyePACS, IDRiD, Messidor, and ODIR), we frame DR detection as a binary classification task and compare three pretrained models (ResNet50, EfficientNet-B0, and DenseNet121) against pairwise and tri-fusion variants. Across five independent runs, fusion consistently outperforms single backbones. The EfficientNet-B0 + DenseNet121 (Eff+Den) fusion model achieves the best overall mean performance (accuracy: 82.89%) with balanced class-wise F1-scores for normal (83.60%) and diabetic (82.60%) cases. While the tri-fusion is competitive, it incurs a substantially higher computational cost. Inference profiling highlights a practical trade-off: EfficientNet-B0 is the fastest (approximately 1.16 ms/image at batch size 1000), whereas the Eff+Den fusion offers a favorable accuracy--latency balance. These findings indicate that lightweight feature fusion can enhance generalization across heterogeneous datasets, supporting scalable binary DR screening workflows where both accuracy and throughput are critical.

Problem

Research questions and friction points this paper is trying to address.

Improves diabetic retinopathy screening accuracy and efficiency via CNN fusion.

Addresses limited specialist availability and variable image quality in screening.

Optimizes trade-off between computational cost and diagnostic performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-level fusion of CNN backbones improves accuracy

EfficientNet-DenseNet fusion balances accuracy and computational cost

Lightweight fusion enhances generalization across heterogeneous datasets

🔎 Similar Papers

No similar papers found.