🤖 AI Summary
To address the need for label-free, real-time, low-cost, and non-destructive single-cell classification and sorting in biomedical applications, this paper proposes an end-to-end FPGA-embedded system for lymphocyte subpopulation (CD4⁺/CD8⁺/B) identification. We introduce a novel teacher–student knowledge distillation framework to achieve extreme model compression—the student network retains only 0.02% of the original model’s parameters. A hardware-software co-optimized inference engine is designed, integrating a bright-field image preprocessing pipeline with on-chip deep learning execution on a Xilinx FPGA. The system achieves an end-to-end latency of 24.7 μs (40× faster than prior art) and an inference latency of 14.5 μs (12× faster), setting a new state-of-the-art. Classification accuracy reaches 98% for CD4⁺/B discrimination and 93% for zero-shot CD8⁺/B classification—demonstrating high fidelity without labeled training data for the latter.
📝 Abstract
Precise cell classification is essential in biomedical diagnostics and therapeutic monitoring, particularly for identifying diverse cell types involved in various diseases. Traditional cell classification methods such as flow cytometry depend on molecular labeling which is often costly, time-intensive, and can alter cell integrity. To overcome these limitations, we present a label-free machine learning framework for cell classification, designed for real-time sorting applications using bright-field microscopy images. This approach leverages a teacher-student model architecture enhanced by knowledge distillation, achieving high efficiency and scalability across different cell types. Demonstrated through a use case of classifying lymphocyte subsets, our framework accurately classifies T4, T8, and B cell types with a dataset of 80,000 preprocessed images, accessible via an open-source Python package for easy adaptation. Our teacher model attained 98% accuracy in differentiating T4 cells from B cells and 93% accuracy in zero-shot classification between T8 and B cells. Remarkably, our student model operates with only 0.02% of the teacher model's parameters, enabling field-programmable gate array (FPGA) deployment. Our FPGA-accelerated student model achieves an ultra-low inference latency of just 14.5~$mu$s and a complete cell detection-to-sorting trigger time of 24.7~$mu$s, delivering 12x and 40x improvements over the previous state-of-the-art real-time cell analysis algorithm in inference and total latency, respectively, while preserving accuracy comparable to the teacher model. This framework provides a scalable, cost-effective solution for lymphocyte classification, as well as a new SOTA real-time cell sorting implementation for rapid identification of subsets using in situ deep learning on off-the-shelf computing hardware.