🤖 AI Summary
To address the challenges of missing multi-attribute fine-grained annotations, limited model interpretability, and high manual annotation costs in blood cell microscopic imaging, this paper proposes a dual-model collaborative classification framework: a CNN backbone performs cell-type classification, while a Vision Transformer (ViT) models morphological attributes—including size, shape, and texture—enabling decoupled yet joint prediction of cell type and attributes. The framework is co-trained on the PBC and WBCAtt datasets, establishing the first multi-attribute automatic annotation benchmark specifically for blood cells. It achieves a multi-attribute classification accuracy of 94.62%. This approach significantly improves annotation efficiency and model interpretability while reducing reliance on expert labeling, thereby introducing a novel paradigm for fine-grained semantic understanding in medical imaging.
📝 Abstract
We introduce AttriGen, a novel framework for automated, fine-grained multi-attribute annotation in computer vision, with a particular focus on cell microscopy where multi-attribute classification remains underrepresented compared to traditional cell type categorization. Using two complementary datasets: the Peripheral Blood Cell (PBC) dataset containing eight distinct cell types and the WBC Attribute Dataset (WBCAtt) that contains their corresponding 11 morphological attributes, we propose a dual-model architecture that combines a CNN for cell type classification, as well as a Vision Transformer (ViT) for multi-attribute classification achieving a new benchmark of 94.62% accuracy. Our experiments demonstrate that AttriGen significantly enhances model interpretability and offers substantial time and cost efficiency relative to conventional full-scale human annotation. Thus, our framework establishes a new paradigm that can be extended to other computer vision classification tasks by effectively automating the expansion of multi-attribute labels.