scholarly journals Crowd Counting Based on Multiresolution Density Map and Parallel Dilated Convolution

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Jingfan Tang ◽  
Meijia Zhou ◽  
Pengfei Li ◽  
Min Zhang ◽  
Ming Jiang

The current crowd counting tasks rely on a fully convolutional network to generate a density map that can achieve good performance. However, due to the crowd occlusion and perspective distortion in the image, the directly generated density map usually neglects the scale information and spatial contact information. To solve it, we proposed MDPDNet (Multiresolution Density maps and Parallel Dilated convolutions’ Network) to reduce the influence of occlusion and distortion on crowd estimation. This network is composed of two modules: (1) the parallel dilated convolution module (PDM) that combines three dilated convolutions in parallel to obtain the deep features on the larger receptive field with fewer parameters while reducing the loss of multiscale information; (2) the multiresolution density map module (MDM) that contains three-branch networks for extracting spatial contact information on three different low-resolution density maps as the feature input of the final crowd density map. Experiments show that MDPDNet achieved excellent results on three mainstream datasets (ShanghaiTech, UCF_CC_50, and UCF-QNRF).

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Pengfei Li ◽  
Min Zhang ◽  
Jian Wan ◽  
Ming Jiang

The most advanced method for crowd counting uses a fully convolutional network that extracts image features and then generates a crowd density map. However, this process often encounters multiscale and contextual loss problems. To address these problems, we propose a multiscale aggregation network (MANet) that includes a feature extraction encoder (FEE) and a density map decoder (DMD). The FEE uses a cascaded scale pyramid network to extract multiscale features and obtains contextual features through dense connections. The DMD uses deconvolution and fusion operations to generate features containing detailed information. These features can be further converted into high-quality density maps to accurately calculate the number of people in a crowd. An empirical comparison using four mainstream datasets (ShanghaiTech, WorldExpo’10, UCF_CC_50, and SmartCity) shows that the proposed method is more effective in terms of the mean absolute error and mean squared error. The source code is available at https://github.com/lpfworld/MANet.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Xiaodong Huang ◽  
Hui Zhang ◽  
Li Zhuo ◽  
Xiaoguang Li ◽  
Jing Zhang

Extracting the tongue body accurately from a digital tongue image is a challenge for automated tongue diagnoses, as the blurred edge of the tongue body, interference of pathological details, and the huge difference in the size and shape of the tongue. In this study, an automated tongue image segmentation method using enhanced fully convolutional network with encoder-decoder structure was presented. In the frame of the proposed network, the deep residual network was adopted as an encoder to obtain dense feature maps, and a Receptive Field Block was assembled behind the encoder. Receptive Field Block can capture adequate global contextual prior because of its structure of the multibranch convolution layers with varying kernels. Moreover, the Feature Pyramid Network was used as a decoder to fuse multiscale feature maps for gathering sufficient positional information to recover the clear contour of the tongue body. The quantitative evaluation of the segmentation results of 300 tongue images from the SIPL-tongue dataset showed that the average Hausdorff Distance, average Symmetric Mean Absolute Surface Distance, average Dice Similarity Coefficient, average precision, average sensitivity, and average specificity were 11.2963, 3.4737, 97.26%, 95.66%, 98.97%, and 98.68%, respectively. The proposed method achieved the best performance compared with the other four deep-learning-based segmentation methods (including SegNet, FCN, PSPNet, and DeepLab v3+). There were also similar results on the HIT-tongue dataset. The experimental results demonstrated that the proposed method can achieve accurate tongue image segmentation and meet the practical requirements of automated tongue diagnoses.


2020 ◽  
Vol 34 (07) ◽  
pp. 12837-12844
Author(s):  
Qi Zhang ◽  
Antoni B. Chan

Crowd counting has been studied for decades and a lot of works have achieved good performance, especially the DNNs-based density map estimation methods. Most existing crowd counting works focus on single-view counting, while few works have studied multi-view counting for large and wide scenes, where multiple cameras are used. Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones. Compared to 2D fusion, the 3D fusion extracts more information of the people along z-dimension (height), which helps to solve the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. We also explore the projection consistency among the 3D prediction and the ground-truth in the 2D views to further enhance the counting performance. The proposed method is tested on 3 multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.


2020 ◽  
Vol 14 (7) ◽  
pp. 443-451
Author(s):  
Suyu Wang ◽  
Bin Yang ◽  
Bo Liu ◽  
Guanghui Zheng

2020 ◽  
Vol 378 ◽  
pp. 455-466
Author(s):  
Liping Zhu ◽  
Chengyang Li ◽  
Bing Wang ◽  
Kun Yuan ◽  
Zhongguo Yang

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-14
Author(s):  
Lina Li

In this paper, we analyze and calculate the crowd density in a tourist area utilizing video surveillance dynamic information analysis and divide the crowd counting and density estimation task into three stages. In this paper, novel scale perception module and inverse scale perception module are designed to further facilitate the mining of multiscale information by the counting model; the main function of the third stage is to generate the population distribution density map, which mainly consists of three columns of void convolution with different void rates and generates the final population distribution density map using the feature maps of different branch regressions. Also, the algorithm uses jump connections between the top convolution and the bottom void convolution layers to reduce the risk of network gradient disappearance and gradient explosion and optimizes the network parameters using an intermediate supervision strategy. The hierarchical density estimator uses a hierarchical strategy to mine semantic features and multiscale information in a coarse-to-fine manner, and this is used to solve the problem of scale variation and perspective distortion. Also, considering that the background noise affects the quality of the generated density map, the soft attention mechanism is integrated into the model to stretch the distance between the foreground and background to further improve the quality of the density map. Also, inspired by multitask learning, this paper embeds an auxiliary count classifier in the count model to perform the count classification auxiliary task and to increase the model’s ability to express semantic information. Numerous experimental results demonstrate the effectiveness and feasibility of the proposed algorithm in solving the problems of scale variation and perspective distortion.


2021 ◽  
Vol 13 (17) ◽  
pp. 3396
Author(s):  
Feng Zhao ◽  
Junjie Zhang ◽  
Zhe Meng ◽  
Hanqiang Liu

Recently, with the extensive application of deep learning techniques in the hyperspectral image (HSI) field, particularly convolutional neural network (CNN), the research of HSI classification has stepped into a new stage. To avoid the problem that the receptive field of naive convolution is small, the dilated convolution is introduced into the field of HSI classification. However, the dilated convolution usually generates blind spots in the receptive field, resulting in discontinuous spatial information obtained. In order to solve the above problem, a densely connected pyramidal dilated convolutional network (PDCNet) is proposed in this paper. Firstly, a pyramidal dilated convolutional (PDC) layer integrates different numbers of sub-dilated convolutional layers is proposed, where the dilated factor of the sub-dilated convolution increases exponentially, achieving multi-sacle receptive fields. Secondly, the number of sub-dilated convolutional layers increases in a pyramidal pattern with the depth of the network, thereby capturing more comprehensive hyperspectral information in the receptive field. Furthermore, a feature fusion mechanism combining pixel-by-pixel addition and channel stacking is adopted to extract more abstract spectral–spatial features. Finally, in order to reuse the features of the previous layers more effectively, dense connections are applied in densely pyramidal dilated convolutional (DPDC) blocks. Experiments on three well-known HSI datasets indicate that PDCNet proposed in this paper has good classification performance compared with other popular models.


Sign in / Sign up

Export Citation Format

Share Document