Crowd Counting Based on Multiresolution Density Map and Parallel Dilated Convolution

Scientific Programming ◽

10.1155/2021/8831458 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Jingfan Tang ◽

Meijia Zhou ◽

Pengfei Li ◽

Min Zhang ◽

Ming Jiang

Keyword(s):

Receptive Field ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Crowd Counting ◽

Dilated Convolution ◽

Perspective Distortion ◽

Contact Information ◽

Crowd Density ◽

Density Maps ◽

Density Map

The current crowd counting tasks rely on a fully convolutional network to generate a density map that can achieve good performance. However, due to the crowd occlusion and perspective distortion in the image, the directly generated density map usually neglects the scale information and spatial contact information. To solve it, we proposed MDPDNet (Multiresolution Density maps and Parallel Dilated convolutions’ Network) to reduce the influence of occlusion and distortion on crowd estimation. This network is composed of two modules: (1) the parallel dilated convolution module (PDM) that combines three dilated convolutions in parallel to obtain the deep features on the larger receptive field with fewer parameters while reducing the loss of multiscale information; (2) the multiresolution density map module (MDM) that contains three-branch networks for extracting spatial contact information on three different low-resolution density maps as the feature input of the final crowd density map. Experiments show that MDPDNet achieved excellent results on three mainstream datasets (ShanghaiTech, UCF_CC_50, and UCF-QNRF).

Download Full-text

Multiscale Aggregate Networks with Dense Connections for Crowd Counting

Computational Intelligence and Neuroscience ◽

10.1155/2021/9996232 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Pengfei Li ◽

Min Zhang ◽

Jian Wan ◽

Ming Jiang

Keyword(s):

Mean Squared Error ◽

Absolute Error ◽

Image Features ◽

Convolutional Network ◽

Crowd Counting ◽

Squared Error ◽

Crowd Density ◽

Density Maps ◽

Density Map ◽

Map Decoder

The most advanced method for crowd counting uses a fully convolutional network that extracts image features and then generates a crowd density map. However, this process often encounters multiscale and contextual loss problems. To address these problems, we propose a multiscale aggregation network (MANet) that includes a feature extraction encoder (FEE) and a density map decoder (DMD). The FEE uses a cascaded scale pyramid network to extract multiscale features and obtains contextual features through dense connections. The DMD uses deconvolution and fusion operations to generate features containing detailed information. These features can be further converted into high-quality density maps to accurately calculate the number of people in a crowd. An empirical comparison using four mainstream datasets (ShanghaiTech, WorldExpo’10, UCF_CC_50, and SmartCity) shows that the proposed method is more effective in terms of the mean absolute error and mean squared error. The source code is available at https://github.com/lpfworld/MANet.

Download Full-text

TISNet-Enhanced Fully Convolutional Network with Encoder-Decoder Structure for Tongue Image Segmentation in Traditional Chinese Medicine

Computational and Mathematical Methods in Medicine ◽

10.1155/2020/6029258 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Xiaodong Huang ◽

Hui Zhang ◽

Li Zhuo ◽

Xiaoguang Li ◽

Jing Zhang

Keyword(s):

Image Segmentation ◽

Receptive Field ◽

Positional Information ◽

Dice Similarity Coefficient ◽

Segmentation Method ◽

Feature Maps ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Segmentation Methods ◽

Feature Pyramid

Extracting the tongue body accurately from a digital tongue image is a challenge for automated tongue diagnoses, as the blurred edge of the tongue body, interference of pathological details, and the huge difference in the size and shape of the tongue. In this study, an automated tongue image segmentation method using enhanced fully convolutional network with encoder-decoder structure was presented. In the frame of the proposed network, the deep residual network was adopted as an encoder to obtain dense feature maps, and a Receptive Field Block was assembled behind the encoder. Receptive Field Block can capture adequate global contextual prior because of its structure of the multibranch convolution layers with varying kernels. Moreover, the Feature Pyramid Network was used as a decoder to fuse multiscale feature maps for gathering sufficient positional information to recover the clear contour of the tongue body. The quantitative evaluation of the segmentation results of 300 tongue images from the SIPL-tongue dataset showed that the average Hausdorff Distance, average Symmetric Mean Absolute Surface Distance, average Dice Similarity Coefficient, average precision, average sensitivity, and average specificity were 11.2963, 3.4737, 97.26%, 95.66%, 98.97%, and 98.68%, respectively. The proposed method achieved the best performance compared with the other four deep-learning-based segmentation methods (including SegNet, FCN, PSPNet, and DeepLab v3+). There were also similar results on the HIT-tongue dataset. The experimental results demonstrated that the proposed method can achieve accurate tongue image segmentation and meet the practical requirements of automated tongue diagnoses.

Download Full-text

3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6980 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12837-12844

Author(s):

Qi Zhang ◽

Antoni B. Chan

Keyword(s):

Feature Fusion ◽

Level Density ◽

Ground Plane ◽

Ground Truth ◽

Estimation Methods ◽

Crowd Counting ◽

Single View ◽

Gaussian Kernels ◽

Density Maps ◽

Density Map

Crowd counting has been studied for decades and a lot of works have achieved good performance, especially the DNNs-based density map estimation methods. Most existing crowd counting works focus on single-view counting, while few works have studied multi-view counting for large and wide scenes, where multiple cameras are used. Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones. Compared to 2D fusion, the 3D fusion extracts more information of the people along z-dimension (height), which helps to solve the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. We also explore the projection consistency among the 3D prediction and the ground-truth in the 2D views to further enhance the counting performance. The proposed method is tested on 3 multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.

Download Full-text

Dual attention module and multi-label based fully convolutional network for crowd counting

IET Computer Vision ◽

10.1049/iet-cvi.2019.0674 ◽

2020 ◽

Vol 14 (7) ◽

pp. 443-451

Author(s):

Suyu Wang ◽

Bin Yang ◽

Bo Liu ◽

Guanghui Zheng

Keyword(s):

Convolutional Network ◽

Fully Convolutional Network ◽

Crowd Counting

Download Full-text

Crowd Counting from a Still Image Using Multi-scale Fully Convolutional Network with Adaptive Human-Shaped Kernel

Image and Video Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-319-92753-4_19 ◽

2018 ◽

pp. 227-240

Author(s):

Jinmeng Cao ◽

Biao Yang ◽

Yuyu Zhang ◽

Ling Zou

Keyword(s):

Convolutional Network ◽

Still Image ◽

Fully Convolutional Network ◽

Crowd Counting ◽

Multi Scale

Download Full-text

DCGSA: A global self-attention network with dilated convolution for crowd density map generating

Neurocomputing ◽

10.1016/j.neucom.2019.10.081 ◽

2020 ◽

Vol 378 ◽

pp. 455-466

Author(s):

Liping Zhu ◽

Chengyang Li ◽

Bing Wang ◽

Kun Yuan ◽

Zhongguo Yang

Keyword(s):

Attention Network ◽

Dilated Convolution ◽

Crowd Density ◽

Density Map

Download Full-text

A Crowd Density Detection Algorithm for Tourist Attractions Based on Monitoring Video Dynamic Information Analysis

Complexity ◽

10.1155/2020/6635446 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Lina Li

Keyword(s):

Distribution Density ◽

Population Distribution ◽

Main Function ◽

Information Analysis ◽

Dynamic Information ◽

Perspective Distortion ◽

Crowd Density ◽

Density Map ◽

Scale Variation

In this paper, we analyze and calculate the crowd density in a tourist area utilizing video surveillance dynamic information analysis and divide the crowd counting and density estimation task into three stages. In this paper, novel scale perception module and inverse scale perception module are designed to further facilitate the mining of multiscale information by the counting model; the main function of the third stage is to generate the population distribution density map, which mainly consists of three columns of void convolution with different void rates and generates the final population distribution density map using the feature maps of different branch regressions. Also, the algorithm uses jump connections between the top convolution and the bottom void convolution layers to reduce the risk of network gradient disappearance and gradient explosion and optimizes the network parameters using an intermediate supervision strategy. The hierarchical density estimator uses a hierarchical strategy to mine semantic features and multiscale information in a coarse-to-fine manner, and this is used to solve the problem of scale variation and perspective distortion. Also, considering that the background noise affects the quality of the generated density map, the soft attention mechanism is integrated into the model to stretch the distance between the foreground and background to further improve the quality of the density map. Also, inspired by multitask learning, this paper embeds an auxiliary count classifier in the count model to perform the count classification auxiliary task and to increase the model’s ability to express semantic information. Numerous experimental results demonstrate the effectiveness and feasibility of the proposed algorithm in solving the problems of scale variation and perspective distortion.

Download Full-text

Densely Connected Pyramidal Dilated Convolutional Network for Hyperspectral Image Classification

Remote Sensing ◽

10.3390/rs13173396 ◽

2021 ◽

Vol 13 (17) ◽

pp. 3396

Author(s):

Feng Zhao ◽

Junjie Zhang ◽

Zhe Meng ◽

Hanqiang Liu

Keyword(s):

Receptive Field ◽

Spatial Information ◽

Hyperspectral Image ◽

Feature Fusion ◽

Receptive Fields ◽

Classification Performance ◽

Convolutional Network ◽

Dilated Convolution ◽

Spatial Features ◽

Good Classification Performance

Recently, with the extensive application of deep learning techniques in the hyperspectral image (HSI) field, particularly convolutional neural network (CNN), the research of HSI classification has stepped into a new stage. To avoid the problem that the receptive field of naive convolution is small, the dilated convolution is introduced into the field of HSI classification. However, the dilated convolution usually generates blind spots in the receptive field, resulting in discontinuous spatial information obtained. In order to solve the above problem, a densely connected pyramidal dilated convolutional network (PDCNet) is proposed in this paper. Firstly, a pyramidal dilated convolutional (PDC) layer integrates different numbers of sub-dilated convolutional layers is proposed, where the dilated factor of the sub-dilated convolution increases exponentially, achieving multi-sacle receptive fields. Secondly, the number of sub-dilated convolutional layers increases in a pyramidal pattern with the depth of the network, thereby capturing more comprehensive hyperspectral information in the receptive field. Furthermore, a feature fusion mechanism combining pixel-by-pixel addition and channel stacking is adopted to extract more abstract spectral–spatial features. Finally, in order to reuse the features of the previous layers more effectively, dense connections are applied in densely pyramidal dilated convolutional (DPDC) blocks. Experiments on three well-known HSI datasets indicate that PDCNet proposed in this paper has good classification performance compared with other popular models.

Download Full-text

Large Receptive Field Fully Convolutional Network for Semantic Segmentation of Retinal Vasculature in Fundus Images

Computational Pathology and Ophthalmic Medical Image Analysis - Lecture Notes in Computer Science ◽

10.1007/978-3-030-00949-6_24 ◽

2018 ◽

pp. 201-209 ◽

Cited By ~ 1

Author(s):

Gabriel Lepetit-Aimon ◽

Renaud Duval ◽

Farida Cheriet

Keyword(s):

Receptive Field ◽

Semantic Segmentation ◽

Retinal Vasculature ◽

Fundus Images ◽

Convolutional Network ◽

Fully Convolutional Network

Download Full-text

CFCN: A Multi-scale Fully Convolutional Network with Dilated Convolution for Nuclei Classification and Localization

10.1007/978-3-030-91415-8_27 ◽

2021 ◽

pp. 314-323

Author(s):

Bin Xin ◽

Yaning Yang ◽

Dongqing Wei ◽

Shaoliang Peng

Keyword(s):

Convolutional Network ◽

Fully Convolutional Network ◽

Multi Scale ◽

Dilated Convolution

Download Full-text