Smart Camera Aware Crowd Counting via Multiple Task Fractional Stride Deep Learning

Minglei Tong; Lyuyuan Fan; Hao Nan; Yan Zhao

doi:10.3390/s19061346

Smart Camera Aware Crowd Counting via Multiple Task Fractional Stride Deep Learning

Sensors ◽

10.3390/s19061346 ◽

2019 ◽

Vol 19 (6) ◽

pp. 1346 ◽

Cited By ~ 3

Author(s):

Minglei Tong ◽

Lyuyuan Fan ◽

Hao Nan ◽

Yan Zhao

Keyword(s):

Deep Learning ◽

Receptive Fields ◽

Density Level ◽

Smart Camera ◽

Crowd Counting ◽

Crowd Density ◽

Effective Performance ◽

Density Map ◽

Deep Learning Model ◽

Multiple Task

Estimating the number of people in highly clustered crowd scenes is an extremely challenging task on account of serious occlusion and non-uniformity distribution in one crowd image. Traditional works on crowd counting take advantage of different CNN like networks to regress crowd density map, and further predict the count. In contrast, we investigate a simple but valid deep learning model that concentrates on accurately predicting the density map and simultaneously training a density level classifier to relax parameters of the network to prevent dangerous stampede with a smart camera. First, a combination of atrous and fractional stride convolutional neural network (CAFN) is proposed to deliver larger receptive fields and reduce the loss of details during down-sampling by using dilated kernels. Second, the expanded architecture is offered to not only precisely regress the density map, but also classify the density level of the crowd in the meantime (MTCAFN, multiple tasks CAFN for both regression and classification). Third, experimental results demonstrated on four datasets (Shanghai Tech A (MAE = 88.1) and B (MAE = 18.8), WorldExpo’10(average MAE = 8.2), NS UCF_CC_50(MAE = 303.2) prove our proposed method can deliver effective performance.

Download Full-text

ResnetCrowd: A residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification

2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) ◽

10.1109/avss.2017.8078482 ◽

2017 ◽

Cited By ~ 31

Author(s):

Mark Marsden ◽

Kevin McGuinness ◽

Suzanne Little ◽

Noel E. O'Connor

Keyword(s):

Deep Learning ◽

Density Level ◽

Violent Behaviour ◽

Crowd Counting ◽

Crowd Density

Download Full-text

Multiscale Aggregate Networks with Dense Connections for Crowd Counting

Computational Intelligence and Neuroscience ◽

10.1155/2021/9996232 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Pengfei Li ◽

Min Zhang ◽

Jian Wan ◽

Ming Jiang

Keyword(s):

Mean Squared Error ◽

Absolute Error ◽

Image Features ◽

Convolutional Network ◽

Crowd Counting ◽

Squared Error ◽

Crowd Density ◽

Density Maps ◽

Density Map ◽

Map Decoder

The most advanced method for crowd counting uses a fully convolutional network that extracts image features and then generates a crowd density map. However, this process often encounters multiscale and contextual loss problems. To address these problems, we propose a multiscale aggregation network (MANet) that includes a feature extraction encoder (FEE) and a density map decoder (DMD). The FEE uses a cascaded scale pyramid network to extract multiscale features and obtains contextual features through dense connections. The DMD uses deconvolution and fusion operations to generate features containing detailed information. These features can be further converted into high-quality density maps to accurately calculate the number of people in a crowd. An empirical comparison using four mainstream datasets (ShanghaiTech, WorldExpo’10, UCF_CC_50, and SmartCity) shows that the proposed method is more effective in terms of the mean absolute error and mean squared error. The source code is available at https://github.com/lpfworld/MANet.

Download Full-text

DGG: A Novel Framework for Crowd Gathering Detection

Electronics ◽

10.3390/electronics11010031 ◽

2021 ◽

Vol 11 (1) ◽

pp. 31

Author(s):

Jianqiang Xu ◽

Haoyu Zhao ◽

Weidong Min ◽

Yi Zou ◽

Qiyan Fu

Keyword(s):

Deep Learning ◽

Local Area ◽

Video Frame ◽

Detection Accuracy ◽

Learning Approaches ◽

Counting Method ◽

Crowd Counting ◽

Stable Pattern ◽

Crowd Density ◽

Public Areas

Crowd gathering detection plays an important role in security supervision of public areas. Existing image-processing-based methods are not robust for complex scenes, and deep-learning-based methods for gathering detection mainly focus on the design of the network, which ignores the inner feature of the crowd gathering action. To alleviate such problems, this work proposes a novel framework Detection of Group Gathering (DGG) based on the crowd counting method using deep learning approaches and statistics to detect crowd gathering. The DGG mainly contains three parts, i.e., Detecting Candidate Frame of Gathering (DCFG), Gathering Area Detection (GAD), and Gathering Judgement (GJ). The DCFG is proposed to find the frame index in a video that has the maximum people number based on the crowd counting method. This frame means that the crowd has gathered and the specific gathering area will be detected next. The GAD detects the local area that has the maximum crowd density in a frame with a slide search box. The local area contains the inner feature of the gathering action and represents that the crowd gathering in this local area, which is denoted by grid coordinates in a video frame. Based on the detected results of the DCFG and the GAD, the GJ is proposed to analyze the statistical relationship between the local area and the global area to find the stable pattern for the crowd gathering action. Experiments based on benchmarks show that the proposed DGG has a robust representation of the gathering feature and a high detection accuracy. There is the potential that the DGG can be used in social security and smart city domains.

Download Full-text

Crowd Counting Based on Multiresolution Density Map and Parallel Dilated Convolution

Scientific Programming ◽

10.1155/2021/8831458 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Jingfan Tang ◽

Meijia Zhou ◽

Pengfei Li ◽

Min Zhang ◽

Ming Jiang

Keyword(s):

Receptive Field ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Crowd Counting ◽

Dilated Convolution ◽

Perspective Distortion ◽

Contact Information ◽

Crowd Density ◽

Density Maps ◽

Density Map

The current crowd counting tasks rely on a fully convolutional network to generate a density map that can achieve good performance. However, due to the crowd occlusion and perspective distortion in the image, the directly generated density map usually neglects the scale information and spatial contact information. To solve it, we proposed MDPDNet (Multiresolution Density maps and Parallel Dilated convolutions’ Network) to reduce the influence of occlusion and distortion on crowd estimation. This network is composed of two modules: (1) the parallel dilated convolution module (PDM) that combines three dilated convolutions in parallel to obtain the deep features on the larger receptive field with fewer parameters while reducing the loss of multiscale information; (2) the multiresolution density map module (MDM) that contains three-branch networks for extracting spatial contact information on three different low-resolution density maps as the feature input of the final crowd density map. Experiments show that MDPDNet achieved excellent results on three mainstream datasets (ShanghaiTech, UCF_CC_50, and UCF-QNRF).

Download Full-text

Deep Learning for Crowd Counting: A Survey

Engineering, MAthematics and Computer Science (EMACS) Journal ◽

10.21512/emacsjournal.v1i1.5794 ◽

2019 ◽

Vol 1 (1) ◽

pp. 17-28

Author(s):

Tjeng Wawan Cenggoro

Keyword(s):

Deep Learning ◽

Learning Model ◽

Learning Models ◽

Crowd Counting ◽

Big Picture ◽

Deep Learning Model

The growth of deep learning for crowd counting is immense in the recent years. This results in numerous deep learning model developed with huge multifariousness. This paper aims to capture a big picture of existing deep learning models for crowd counting. Hence, the development of novel models for future works can be accelerated.

Download Full-text

Multi-Stream Networks and Ground Truth Generation for Crowd Counting

International journal of electrical and computer engineering systems ◽

10.32985/ijeces.11.1.4 ◽

2020 ◽

Vol 11 (1) ◽

pp. 33-41

Author(s):

Rodolfo Quispe ◽

Darwin Ttito ◽

Adín Rivera ◽

Helio Pedrini

Keyword(s):

Neural Network ◽

Network Architecture ◽

Receptive Fields ◽

Ground Truth ◽

Scene Analysis ◽

Stream Networks ◽

Single Image ◽

Crowd Counting ◽

Ground Truth Generation ◽

Density Map

Crowd scene analysis has received a lot of attention recently due to a wide variety of applications, e.g., forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting [1–6], whose main purpose is to estimate the number of people present in a single image. A multi-stream convolutional neural network is developed and evaluated in this paper, which receives an image as input and produces a density map that represents the spatial distribution of people in an end-to-end fashion. In order to address complex crowd counting issues, such as extremely unconstrained scale and perspective changes, the network architecture utilizes receptive fields with different size filters for each stream. In addition, we investigate the influence of the two most common fashions on the generation of ground truths and propose a hybrid method based on tiny face detection and scale interpolation. Experiments conducted on two challenging datasets, UCF-CC-50 and ShanghaiTech, demonstrate that the use of our ground truth generation methods achieves superior results.

Download Full-text

One Shot Crowd Counting with Deep Scale Adaptive Neural Network

Electronics ◽

10.3390/electronics8060701 ◽

2019 ◽

Vol 8 (6) ◽

pp. 701 ◽

Cited By ~ 1

Author(s):

Junfeng Wu ◽

Zhiyang Li ◽

Wenyu Qu ◽

Yizhi Zhou

Keyword(s):

Neural Network ◽

Adaptive Neural Network ◽

Crowd Counting ◽

Crowd Density ◽

Proposed Model ◽

Perspective Image ◽

Perspective Effect ◽

Camera Perspective ◽

Density Map ◽

Public Datasets

This paper aims to utilize the deep learning architecture to break through the limitations of camera perspective, image background, uneven crowd density distribution and pedestrian occlusion to estimate crowd density accurately. In this paper, we proposed a new neural network called Deep Scale-Adaptive Convolutional Neural Network (DSA-CNN), which can convert a single crowd image to density map for crowd counting directly. For a crowd image with any size and resolution, our algorithm can output the density map of the crowd image by end-to-end method and finally estimate the number of the crowd in the image. The proposed DSA-CNN consists of two parts: the seven layers CNN network structure and DSA modules. In order to ensure the proposed method is robust to camera perspective effect, DSA-CNN has adopted different sizes of filters in the network and combines them ingeniously. In order to reduce the depth of the data to increase the speed of training, the proposed method utilized 1 × 1 filter in DSA module. To validate the effectiveness of the proposed model, we conducted comparative experiments on four popular public datasets (ShanghiTech dataset, UCF_CC_50 dataset, WorldExpo’10 dataset and UCSD dataset). We compare the proposed method with other well-known algorithms on the MAE and MSE indicators, such as MCNN, Switching-CNN, CSRNet, CP-CNN and Cascaded-MTL. Experimental results show that the proposed method has excellent performance. In addition, we found that the proposed model is easily trained, which further increases the usability of the proposed model.

Download Full-text

Estimation of Crowd Density from UAV Images based on Deep Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37324 ◽

2021 ◽

Vol 9 (VIII) ◽

pp. 242-248

Author(s):

Sarita Chauhan

Keyword(s):

Neural Network ◽

Deep Learning ◽

Unmanned Aerial Vehicles ◽

Crowd Counting ◽

Aerial Vehicles ◽

Crowd Density ◽

Ip Camera ◽

Crowd Monitoring ◽

Uav Images ◽

Crowd Surveillance

Crowd monitoring is necessary to improve safety and controllable movements to minimize risk, especially in high crowded events, such as Kumbh Mela, political rallies, sports event etc. In this current digital age mostly crowd monitoring still relies on outdated methods such as keeping records, using people counters manually, and using sensors to count people at the entrance. These approaches are futile in situations where people's movements are completely unpredictable, highly variable, and complex. Crowd surveillance using unmanned aerial vehicles (UAVs), can help us solve these problems. The proposed paper uses a UAV on which an IP Camera will be attached to get media, we then use a convolutional neural network to learn a regression model for crowd counting, the model will be trained extensively by using three widely used crowd counting datasets, ShanghaiTech part A and part B, UCF-CC 50 and UCF-QNRF.

Download Full-text

Crowd Counting using Deep Recurrent Spatial-Aware Network

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/118 ◽

2018 ◽

Cited By ~ 48

Author(s):

Lingbo Liu ◽

Hongjun Wang ◽

Guanbin Li ◽

Wanli Ouyang ◽

Liang Lin

Keyword(s):

Neural Network ◽

Real World ◽

Local Refinement ◽

Crowd Counting ◽

Multi Scale ◽

Residual Learning ◽

Crowd Density ◽

Real World Applications ◽

Refinement Process ◽

Density Map

Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the camera’s perspective that causes huge appearance variations in people’s scales and rotations. Conventional methods address such challenges by resorting to fixed multi-scale architectures that are often unable to cover the largely varied scales while ignoring the rotation variations. In this paper, we propose a unified neural network framework, named Deep Recurrent Spatial-Aware Network, which adaptively addresses the two issues in a learnable spatial transform module with a region-wise refinement process. Specifically, our framework incorporates a Recurrent Spatial-Aware Refinement (RSAR) module iteratively conducting two components: i) a Spatial Transformer Network that dynamically locates an attentional region from the crowd density map and transforms it to the suitable scale and rotation for optimal crowd estimation; ii) a Local Refinement Network that refines the density map of the attended region with residual learning. Extensive experiments on four challenging benchmarks show the effectiveness of our approach. Specifically, comparing with the existing best-performing methods, we achieve an improvement of 12\% on the largest dataset WorldExpo’10 and 22.8\% on the most challenging dataset UCF\_CC\_50

Download Full-text

Proposal of a Monitoring System to Determine the Possibility of Contact with Confirmed Infectious Diseases Using K-means Clustering Algorithm and Deep Learning Based Crowd Counting

Korean Institute of Smart Media ◽

10.30693/smj.2020.9.3.122 ◽

2020 ◽

Vol 9 (3) ◽

pp. 122-129

Author(s):

Dongsu Lee ◽

ASHIQUZZAMAN A K M ◽

Yeonggwang Kim ◽

혜주 신 ◽

Jinsul Kim

Keyword(s):

Deep Learning ◽

Infectious Diseases ◽

Monitoring System ◽

Clustering Algorithm ◽

Crowd Counting

Download Full-text