Local Information Matters: A Rethink of Crowd Counting

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In crowded scene counting, existing methods rely excessively on large-receptive-field backbone networks, undermining local modeling capability—especially for minuscule head-sized targets. To address this, we propose a “local-information-first” modeling paradigm. Our approach (1) partitions the input image into non-overlapping local windows via a grid-based mechanism; (2) introduces intra-window contrastive learning to enhance discriminability of subtle density variations; and (3) integrates a global attention module at the network’s end to jointly model large-scale individual features and global spatial distribution. Evaluated on the high-density subset of JHU-Crowd++, our method achieves an 8.7% reduction in Mean Absolute Error (MAE), significantly improving local density estimation accuracy while maintaining state-of-the-art overall counting performance.

Technology Category

Application Category

📝 Abstract

The motivation of this paper originates from rethinking an essential characteristic of crowd counting: individuals (heads of humans) in the crowd counting task typically occupy a very small portion of the image. This characteristic has never been the focus of existing works: they typically use the same backbone as other visual tasks and pursue a large receptive field. This drives us to propose a new model design principle of crowd counting: emphasizing local modeling capability of the model. We follow the principle and design a crowd counting model named Local Information Matters Model (LIMM). The main innovation lies in two strategies: a window partitioning design that applies grid windows to the model input, and a window-wise contrastive learning design to enhance the model's ability to distinguish between local density levels. Moreover, a global attention module is applied to the end of the model to handle the occasionally occurring large-sized individuals. Extensive experiments on multiple public datasets illustrate that the proposed model shows a significant improvement in local modeling capability (8.7% in MAE on the JHU-Crowd++ high-density subset for example), without compromising its ability to count large-sized ones, which achieves state-of-the-art performance. Code is available at: https://github.com/tianhangpan/LIMM.

Problem

Research questions and friction points this paper is trying to address.

Improving local modeling for small heads in crowd counting

Addressing limited receptive field issues in existing methods

Enhancing density level distinction through contrastive learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Window partitioning design for grid input

Window-wise contrastive learning for density

Global attention module for large individuals

🔎 Similar Papers

No similar papers found.

Authors to Follow