From Data to Diagnosis: A Large, Comprehensive Bone Marrow Dataset and AI Methods for Childhood Leukemia Prediction

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manual bone marrow morphology analysis for pediatric leukemia diagnosis is labor-intensive, subjective, and lacks standardized, large-scale public data. To address this, we introduce the first open, end-to-end benchmark dataset for pediatric leukemia—comprising over 10,000 high-quality, expert-annotated bone marrow images spanning cell detection, fine-grained classification (33 cell types), and clinical diagnosis. Leveraging this resource, we propose an integrated deep learning framework: (1) Faster R-CNN for precise cell localization (mAP = 0.96); (2) a fine-grained classification model achieving AUC = 0.98; and (3) a diagnostic prediction module incorporating cellular counts, attaining F1-score = 0.90. This work establishes the first publicly available, clinically grounded, full-pipeline benchmark, substantially enhancing reproducibility, generalizability, and clinical translatability of AI-assisted diagnosis—thereby advancing standardization and intelligence in pediatric leukemia care.

Technology Category

Application Category

📝 Abstract
Leukemia diagnosis primarily relies on manual microscopic analysis of bone marrow morphology supported by additional laboratory parameters, making it complex and time consuming. While artificial intelligence (AI) solutions have been proposed, most utilize private datasets and only cover parts of the diagnostic pipeline. Therefore, we present a large, high-quality, publicly available leukemia bone marrow dataset spanning the entire diagnostic process, from cell detection to diagnosis. Using this dataset, we further propose methods for cell detection, cell classification, and diagnosis prediction. The dataset comprises 246 pediatric patients with diagnostic, clinical and laboratory information, over 40 000 cells with bounding box annotations and more than 28 000 of these with high-quality class labels, making it the most comprehensive dataset publicly available. Evaluation of the AI models yielded an average precision of 0.96 for the cell detection, an area under the curve of 0.98, and an F1-score of 0.61 for the 33-class cell classification, and a mean F1-score of 0.90 for the diagnosis prediction using predicted cell counts. While the proposed approaches demonstrate their usefulness for AI-assisted diagnostics, the dataset will foster further research and development in the field, ultimately contributing to more precise diagnoses and improved patient outcomes.
Problem

Research questions and friction points this paper is trying to address.

Creating a public bone marrow dataset for childhood leukemia diagnosis
Developing AI methods for cell detection and classification
Improving diagnostic accuracy and efficiency through AI-assisted analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Public bone marrow dataset for leukemia diagnosis
AI methods for cell detection and classification
Comprehensive diagnostic pipeline from data to prediction
🔎 Similar Papers
No similar papers found.
H
Henning Höfener
Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
F
Farina Kock
Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
M
Martina Pontones
Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany
T
Tabita Ghete
Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany; Bavarian Cancer Research Center (BZKF), Erlangen, Germany
D
David Pfrang
Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
N
Nicholas Dickel
Medical Informatics, Friedrich-Alexander University of Erlangen-Nürnberg, Erlangen, Germany
M
Meik Kunz
Medical Informatics, Friedrich-Alexander University of Erlangen-Nürnberg, Erlangen, Germany
D
Daniela P Schacherer
Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
D
David A Clunie
PixelMed Publishing LLC, Bangor, PA, USA
Andrey Fedorov
Andrey Fedorov
Radiology, Brigham and Women's Hospital, Harvard Medical School
Imaging Data CommonsMedical image computingImaging informaticsOpen science
M
Max Westphal
Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
M
Markus Metzler
Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany; Bavarian Cancer Research Center (BZKF), Erlangen, Germany; Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany