🤖 AI Summary
This work addresses the previously unexplored task of semi-supervised multimodal crowd counting by establishing the first unified benchmark, which clearly defines the task formulation and evaluation protocol and partitions labeled and unlabeled data under varying annotation ratios. By adapting and integrating existing fully supervised multimodal approaches and semi-supervised unimodal methods as baselines, the study systematically evaluates the effectiveness of multimodal fusion and semi-supervised learning in this context. This research fills a critical gap in the literature by providing a reproducible evaluation framework, comprehensive performance references, and publicly released code and data splits, thereby laying a solid foundation for future investigations in semi-supervised multimodal crowd counting.
📝 Abstract
This paper constructs the first benchmark on semi-supervised multi-modal crowd counting. To lay the foundation for this unexplored task, we first formulate the semi-supervised multi-modal setting and a standardized protocol that specifies the labeled-unlabeled data partition across different labeled ratios. Next, to establish solid reference points, we carefully tailor a diverse set of representative baselines, including existing fully supervised multi-modal methods and semi-supervised single-modal methods. Then, we carefully evaluate their performance under our proposed benchmark. Codes and the data partition will be released on https://github.com/HenryCilence/Semi-supervised-Multimodal-Crowd-Counting.