Crowd counting plays a critical role in the intelligent video surveillance of public areas. A significant challenge to this task is the perspective effect on human heads, which causes serious scale variations. Height reverse perspective transformation (HRPT) alleviates this problem by narrowing the height gap among human heads.
It employs depth maps to calculate the rescaling factors of image rows, and then it performs image transformation accordingly. HRPT enlarges small human heads in far areas to make them more noticeable and shrinks large human heads in closer areas to reduce redundant information. Then, convolutional neural networks can be used for crowd counting. Previous crowd-counting methods mainly solve the scale variation problem by designing specific networks, such as multi-scale or perspective-aware networks. These networks cannot be conveniently employed by other methods. In contrast, HRPT solves the scale variation problem through image transformation. It can be used as a preprocessing step and easily employed by other crowd-counting methods without changing their original structures.
Experimental results show that HRPT successfully narrows the height gap among human heads and achieves state-of-the-art performance on a large crowd-counting RGB-D dataset.