Exploring visual language models as a powerful tool in the diagnosis of Ewing Sarcoma

📅 2025-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Ewing sarcoma (ES), particularly prevalent in adolescents, poses significant diagnostic challenges due to its morphological similarity to other sarcomas, leading to frequent misclassification. To address this, we propose the first application of vision-language supervision (VLS) pretraining to tissue microarray image analysis for ES differential diagnosis. Instead of conventional ImageNet-based supervised initialization, our method leverages aligned image–text pairs to guide feature learning and integrates multi-instance learning (MIL) to accommodate weakly labeled pathological images. Experimental results demonstrate substantial improvements: the model achieves significantly higher accuracy in ES discrimination while reducing parameter count by 42% and inference latency by 38%, thereby balancing diagnostic performance and deployment efficiency. This work establishes a generalizable paradigm for intelligent histopathological diagnosis of rare tumors characterized by limited annotated data and high inter-class ambiguity.

Technology Category

Application Category

📝 Abstract
Ewing's sarcoma (ES), characterized by a high density of small round blue cells without structural organization, presents a significant health concern, particularly among adolescents aged 10 to 19. Artificial intelligence-based systems for automated analysis of histopathological images are promising to contribute to an accurate diagnosis of ES. In this context, this study explores the feature extraction ability of different pre-training strategies for distinguishing ES from other soft tissue or bone sarcomas with similar morphology in digitized tissue microarrays for the first time, as far as we know. Vision-language supervision (VLS) is compared to fully-supervised ImageNet pre-training within a multiple instance learning paradigm. Our findings indicate a substantial improvement in diagnostic accuracy with the adaption of VLS using an in-domain dataset. Notably, these models not only enhance the accuracy of predicted classes but also drastically reduce the number of trainable parameters and computational costs.
Problem

Research questions and friction points this paper is trying to address.

Ewing's Sarcoma
Diagnostic Accuracy
Visual Language Model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Language Synchronization (VLS)
Ewing's Sarcoma (ES) Diagnosis
Efficient Resource Utilization
🔎 Similar Papers
No similar papers found.
A
Alvaro Pastor-Naranjo
Instituto Universitario de Investigación e Innovación en Tecnología Centrada en el Ser Humano, HUMAN-tech, Universitat Politècnica de València, València, Spain; valgrAI - Valencian Graduate School and Research Network of Artificial Intelligence, València, Spain
P
Pablo Meseguer Esbri
Instituto Universitario de Investigación e Innovación en Tecnología Centrada en el Ser Humano, HUMAN-tech, Universitat Politècnica de València, València, Spain; valgrAI - Valencian Graduate School and Research Network of Artificial Intelligence, València, Spain
Rocío del Amor
Rocío del Amor
Universidad politécnica de Valencia
Artificial IntelligenceComputer Vision
J
J. López-Guerrero
Molecular Biology Laboratory, Valencian Oncology Institute, Valencia, España
S
Samuel Navarro
Pathology Department, Universitat de Valencia, Valencia, España; CIBER Cancer (CIBERONC), Madrid, España
K
Katia Scotlandi
Experimental Pathology Laboratory. Rizzoli Orthopedic Institute, Bologna, Italia
A
Antonio Llombart-Bosch
Pathology Department, Universitat de Valencia, Valencia, España
I
I. Machado
Pathology Department, Universitat de Valencia, Valencia, España; CIBER Cancer (CIBERONC), Madrid, España; Departamento de Patología, Valencian Oncology Institute, Valencia, España; Experimental Patologika Laboratory, QuironSalud hospital, Valencia, España
Valery Naranjo
Valery Naranjo
Universitat Politècncia de València
image processingvideo processingdeep learningmachine learninghistological image processing