Singpath-VL Technical Report

πŸ“… 2026-02-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the scarcity of high-quality, large-scale annotated data in cervical cytopathology, which has hindered the application of multimodal foundation models. To overcome this limitation, we propose Singpath-VLβ€”the first vision-language foundation model tailored for cervical cytology. We construct a million-scale image-text dataset through a three-stage synthetic pipeline that integrates multi-model weak annotations, consensus fusion, and expert knowledge injection. Building upon Qwen3-VL-4B, the model undergoes multi-stage fine-tuning and demonstrates superior performance in fine-grained cellular morphology understanding and diagnostic classification tasks. To foster community progress, we will open-source a portion of the synthesized data along with a standardized evaluation benchmark.

Technology Category

Application Category

πŸ“ Abstract
We present Singpath-VL, a vision-language large model, to fill the vacancy of AI assistant in cervical cytology. Recent advances in multi-modal large language models (MLLMs) have significantly propelled the field of computational pathology. However, their application in cytopathology, particularly cervical cytology, remains underexplored, primarily due to the scarcity of large-scale, high-quality annotated datasets. To bridge this gap, we first develop a novel three-stage pipeline to synthesize a million-scale image-description dataset. The pipeline leverages multiple general-purpose MLLMs as weak annotators, refines their outputs through consensus fusion and expert knowledge injection, and produces high-fidelity descriptions of cell morphology. Using this dataset, we then fine-tune the Qwen3-VL-4B model via a multi-stage strategy to create a specialized cytopathology MLLM. The resulting model, named Singpath-VL, demonstrates superior performance in fine-grained morphological perception and cell-level diagnostic classification. To advance the field, we will open-source a portion of the synthetic dataset and benchmark.
Problem

Research questions and friction points this paper is trying to address.

cervical cytology
multi-modal large language models
computational pathology
annotated datasets
cytopathology
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language model
synthetic dataset
consensus fusion
expert knowledge injection
cervical cytology
πŸ”Ž Similar Papers
No similar papers found.
Zhen Qiu
Zhen Qiu
South China University of Technology
deep learningcomputer vision
K
Kaiwen Xiao
LBP Singpath AI Lab
Z
Zhengwei Lu
LBP Singpath AI Lab
X
Xiangyu Liu
LBP Singpath AI Lab
L
Lei Zhao
LBP Singpath AI Lab
H
Hao Zhang
LBP Singpath AI Lab