Skip to main content

ORIGINAL RESEARCH article

Front. Phys.
Sec. Radiation Detectors and Imaging
Volume 12 - 2024 | doi: 10.3389/fphy.2024.1489245
This article is part of the Research Topic Advanced Deep Learning Algorithms for Multi-Source Data and Imaging View all 10 articles

Lightweight Multi-stage Temporal Inference Network for Video Crowd Counting

Provisionally accepted
Wei Gao Wei Gao 1,2Rui Feng Rui Feng 3*Xiaochun Sheng Xiaochun Sheng 2*
  • 1 School of Educational Science, Yangzhou University, Yangzhou, Jiangsu Province, China
  • 2 School of Computer Engineering, Jiangsu University of Technology, Changzhou, China
  • 3 School of Journalism and Communication, Yangzhou University, Yangzhou, Jiangsu Province, China

The final, formatted version of the article will be published soon.

    Crowd density is an important metric for preventing excessive crowding in a particular area, but it still faces challenges such as perspective distortion, scale variation, and pedestrian occlusion. Existing studies have attempted to model the spatio-temporal dependencies in videos using LSTM and 3D CNNs. However, these methods suffer from large computational costs, excessive parameter redundancy, and loss of temporal information, leading to difficulties in model convergence and limited recognition performance. To address these issues, we propose a lightweight multi-stage temporal inference network (LMSTIN) for video crowd counting. LMSTIN effectively models the spatiotemporal dependencies in video sequences at a fine-grained level, enabling real-time and accurate video crowd counting. Our proposed method achieves significant performance improvements on three public crowd counting datasets.

    Keywords: Crowd counting, Crowd density, Spatio-temporal dependencies, Temporal inference, deep learning

    Received: 31 Aug 2024; Accepted: 23 Oct 2024.

    Copyright: © 2024 Gao, Feng and Sheng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Rui Feng, School of Journalism and Communication, Yangzhou University, Yangzhou, 225009, Jiangsu Province, China
    Xiaochun Sheng, School of Computer Engineering, Jiangsu University of Technology, Changzhou, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.