🤖 AI Summary
In crowded scene counting, existing methods rely excessively on large-receptive-field backbone networks, undermining local modeling capability—especially for minuscule head-sized targets. To address this, we propose a “local-information-first” modeling paradigm. Our approach (1) partitions the input image into non-overlapping local windows via a grid-based mechanism; (2) introduces intra-window contrastive learning to enhance discriminability of subtle density variations; and (3) integrates a global attention module at the network’s end to jointly model large-scale individual features and global spatial distribution. Evaluated on the high-density subset of JHU-Crowd++, our method achieves an 8.7% reduction in Mean Absolute Error (MAE), significantly improving local density estimation accuracy while maintaining state-of-the-art overall counting performance.
📝 Abstract
The motivation of this paper originates from rethinking an essential characteristic of crowd counting: individuals (heads of humans) in the crowd counting task typically occupy a very small portion of the image. This characteristic has never been the focus of existing works: they typically use the same backbone as other visual tasks and pursue a large receptive field. This drives us to propose a new model design principle of crowd counting: emphasizing local modeling capability of the model. We follow the principle and design a crowd counting model named Local Information Matters Model (LIMM). The main innovation lies in two strategies: a window partitioning design that applies grid windows to the model input, and a window-wise contrastive learning design to enhance the model's ability to distinguish between local density levels. Moreover, a global attention module is applied to the end of the model to handle the occasionally occurring large-sized individuals. Extensive experiments on multiple public datasets illustrate that the proposed model shows a significant improvement in local modeling capability (8.7% in MAE on the JHU-Crowd++ high-density subset for example), without compromising its ability to count large-sized ones, which achieves state-of-the-art performance. Code is available at: https://github.com/tianhangpan/LIMM.