🤖 AI Summary
This study addresses the challenges of microscopic diagnosis of soil-transmitted helminth infections in resource-limited settings, where conventional methods are time-consuming, labor-intensive, and prone to human error. To overcome these limitations, the authors propose a scalable intelligent diagnostic system by fine-tuning Microsoft’s Florence vision-language model for parasite egg localization—a novel application in this domain. The approach significantly outperforms traditional object detection models such as EfficientDet, achieving a mean Intersection over Union (mIOU) of 0.94. These results demonstrate the effectiveness and substantial potential of vision-language models in automating parasitic disease diagnosis, offering a promising pathway toward more accurate and efficient diagnostic tools in low-resource environments.
📝 Abstract
Soil-transmitted helminth (STH) infections continuously affect a large proportion of the global population, particularly in tropical and sub-tropical regions, where access to specialized diagnostic expertise is limited. Although manual microscopic diagnosis of parasitic eggs remains the diagnostic gold standard, the approach can be labour-intensive, time-consuming, and prone to human error. This paper aims to utilize a vision language model (VLM) such as Microsoft Florence that was fine-tuned to localize all parasitic eggs within microscopic images. The preliminary results show that our localization VLM performs comparatively better than the other object detection methods, such as EfficientDet, with an mIOU of 0.94. This finding demonstrates the potential of the proposed VLM to serve as a core component of an automated framework, offering a scalable engineering solution for intelligent parasitological diagnosis.