Singpath-VL Technical Report

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the scarcity of high-quality, large-scale annotated data in cervical cytopathology, which has hindered the application of multimodal foundation models. To overcome this limitation, we propose Singpath-VL—the first vision-language foundation model tailored for cervical cytology. We construct a million-scale image-text dataset through a three-stage synthetic pipeline that integrates multi-model weak annotations, consensus fusion, and expert knowledge injection. Building upon Qwen3-VL-4B, the model undergoes multi-stage fine-tuning and demonstrates superior performance in fine-grained cellular morphology understanding and diagnostic classification tasks. To foster community progress, we will open-source a portion of the synthesized data along with a standardized evaluation benchmark.

Technology Category

Application Category

📝 Abstract

We present Singpath-VL, a vision-language large model, to fill the vacancy of AI assistant in cervical cytology. Recent advances in multi-modal large language models (MLLMs) have significantly propelled the field of computational pathology. However, their application in cytopathology, particularly cervical cytology, remains underexplored, primarily due to the scarcity of large-scale, high-quality annotated datasets. To bridge this gap, we first develop a novel three-stage pipeline to synthesize a million-scale image-description dataset. The pipeline leverages multiple general-purpose MLLMs as weak annotators, refines their outputs through consensus fusion and expert knowledge injection, and produces high-fidelity descriptions of cell morphology. Using this dataset, we then fine-tune the Qwen3-VL-4B model via a multi-stage strategy to create a specialized cytopathology MLLM. The resulting model, named Singpath-VL, demonstrates superior performance in fine-grained morphological perception and cell-level diagnostic classification. To advance the field, we will open-source a portion of the synthetic dataset and benchmark.

Problem

Research questions and friction points this paper is trying to address.

cervical cytology

multi-modal large language models

computational pathology

annotated datasets

cytopathology

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language model

synthetic dataset

consensus fusion