A High-Density Crowd Counting Method Based on Convolutional Feature Fusion

Hongling Luo; Jun Sang; Weiqun Wu; Hong Xiang; Zhili Xiang; Qian Zhang; Zhongyuan Wu

doi:10.3390/app8122367

A High-Density Crowd Counting Method Based on Convolutional Feature Fusion

Applied Sciences ◽

10.3390/app8122367 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2367 ◽

Cited By ~ 5

Author(s):

Hongling Luo ◽

Jun Sang ◽

Weiqun Wu ◽

Hong Xiang ◽

Zhili Xiang ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Large Scale ◽

Feature Fusion ◽

High Density ◽

Counting Problem ◽

Crowd Counting ◽

Low Level ◽

Density Map ◽

High Level

In recent years, the trampling events due to overcrowding have occurred frequently, which leads to the demand for crowd counting under a high-density environment. At present, there are few studies on monitoring crowds in a large-scale crowded environment, while there exists technology drawbacks and a lack of mature systems. Aiming to solve the crowd counting problem with high-density under complex environments, a feature fusion-based deep convolutional neural network method FF-CNN (Feature Fusion of Convolutional Neural Network) was proposed in this paper. The proposed FF-CNN mapped the crowd image to its crowd density map, and then obtained the head count by integration. The geometry adaptive kernels were adopted to generate high-quality density maps which were used as ground truths for network training. The deconvolution technique was used to achieve the fusion of high-level and low-level features to get richer features, and two loss functions, i.e., density map loss and absolute count loss, were used for joint optimization. In order to increase the sample diversity, the original images were cropped with a random cropping method for each iteration. The experimental results of FF-CNN on the ShanghaiTech public dataset showed that the fusion of low-level and high-level features can extract richer features to improve the precision of density map estimation, and further improve the accuracy of crowd counting.

Download Full-text

A Siamese convolutional neural network with high–low level feature fusion for change detection in remotely sensed images

Remote Sensing Letters ◽

10.1080/2150704x.2021.1892851 ◽

2021 ◽

Vol 12 (4) ◽

pp. 387-396

Author(s):

Hao Zhou ◽

Mi Zhang ◽

Xiangyun Hu ◽

Kun Li ◽

Jing Sun

Keyword(s):

Neural Network ◽

Change Detection ◽

Convolutional Neural Network ◽

Feature Fusion ◽

Remotely Sensed ◽

Low Level ◽

Remotely Sensed Images

Download Full-text

Convolutional Neural Network for Crowd Counting on Metro Platforms

Symmetry ◽

10.3390/sym13040703 ◽

2021 ◽

Vol 13 (4) ◽

pp. 703

Author(s):

Jun Zhang ◽

Jiaze Liu ◽

Zhizhong Wang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Estimation Error ◽

Image Features ◽

Urban Rail Transit ◽

Crowd Counting ◽

Passenger Flow ◽

Urban Rail ◽

Density Map ◽

Flow Detection

Owing to the increased use of urban rail transit, the flow of passengers on metro platforms tends to increase sharply during peak periods. Monitoring passenger flow in such areas is important for security-related reasons. In this paper, in order to solve the problem of metro platform passenger flow detection, we propose a CNN (convolutional neural network)-based network called the MP (metro platform)-CNN to accurately count people on metro platforms. The proposed method is composed of three major components: a group of convolutional neural networks is used on the front end to extract image features, a multiscale feature extraction module is used to enhance multiscale features, and transposed convolution is used for upsampling to generate a high-quality density map. Currently, existing crowd-counting datasets do not adequately cover all of the challenging situations considered in this study. Therefore, we collected images from surveillance videos of a metro platform to form a dataset containing 627 images, with 9243 annotated heads. The results of the extensive experiments showed that our method performed well on the self-built dataset and the estimation error was minimum. Moreover, the proposed method could compete with other methods on four standard crowd-counting datasets.

Download Full-text

Low-Rank and Sparse Based Deep-Fusion Convolutional Neural Network for Crowd Counting

Mathematical Problems in Engineering ◽

10.1155/2017/5046727 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Siqi Tang ◽

Zhisong Pan ◽

Xingyu Zhou

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Regression Method ◽

Low Rank ◽

Counting Method ◽

Direct Integral ◽

Crowd Counting ◽

Counting Methods ◽

Density Map

This paper proposes an accurate crowd counting method based on convolutional neural network and low-rank and sparse structure. To this end, we firstly propose an effective deep-fusion convolutional neural network to promote the density map regression accuracy. Furthermore, we figure out that most of the existing CNN based crowd counting methods obtain overall counting by direct integral of estimated density map, which limits the accuracy of counting. Instead of direct integral, we adopt a regression method based on low-rank and sparse penalty to promote accuracy of the projection from density map to global counting. Experiments demonstrate the importance of such regression process on promoting the crowd counting performance. The proposed low-rank and sparse based deep-fusion convolutional neural network (LFCNN) outperforms existing crowd counting methods and achieves the state-of-the-art performance.

Download Full-text

Dependency Exploitation: A Unified CNN-RNN Approach for Visual Emotion Recognition

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/503 ◽

2017 ◽

Cited By ~ 21

Author(s):

Xinge Zhu ◽

Liang Li ◽

Weigang Zhang ◽

Tianrong Rao ◽

Min Xu ◽

...

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Feature Fusion ◽

Feature Representation ◽

Low Level ◽

Learning Framework ◽

Independent Entity ◽

Internet Images ◽

High Level ◽

Different Levels

Visual emotion recognition aims to associate images with appropriate emotions. There are different visual stimuli that can affect human emotion from low-level to high-level, such as color, texture, part, object, etc. However, most existing methods treat different levels of features as independent entity without having effective method for feature fusion. In this paper, we propose a unified CNN-RNN model to predict the emotion based on the fused features from different levels by exploiting the dependency among them. Our proposed architecture leverages convolutional neural network (CNN) with multiple layers to extract different levels of features with in a multi-task learning framework, in which two related loss functions are introduced to learn the feature representation. Considering the dependencies within the low-level and high-level features, a new bidirectional recurrent neural network (RNN) is proposed to integrate the learned features from different layers in the CNN model. Extensive experiments on both Internet images and art photo datasets demonstrate that our method outperforms the state-of-the-art methods with at least 7% performance improvement.

Download Full-text

Deep Convolutional Neural Network for Pedestrian Detection with Multi-Levels Features Fusion

MATEC Web of Conferences ◽

10.1051/matecconf/201823201061 ◽

2018 ◽

Vol 232 ◽

pp. 01061

Author(s):

Danhua Li ◽

Xiaofeng Di ◽

Xuan Qu ◽

Yunfei Zhao ◽

Honggang Kong

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Pedestrian Detection ◽

Deep Convolutional Neural Network ◽

High Quality ◽

Low Level ◽

Features Fusion ◽

Current State ◽

High Level

Pedestrian detection aims to localize and recognize every pedestrian instance in an image with a bounding box. The current state-of-the-art method is Faster RCNN, which is such a network that uses a region proposal network (RPN) to generate high quality region proposals, while Fast RCNN is used to classifiers extract features into corresponding categories. The contribution of this paper is integrated low-level features and high-level features into a Faster RCNN-based pedestrian detection framework, which efficiently increase the capacity of the feature. Through our experiments, we comprehensively evaluate our framework, on the Caltech pedestrian detection benchmark and our methods achieve state-of-the-art accuracy and present a competitive result on Caltech dataset.

Download Full-text

MFNet: Multi-feature convolutional neural network for high-density crowd counting

2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) ◽

10.1109/iemcon51383.2020.9284903 ◽

2020 ◽

Author(s):

Songchenchen Gong ◽

El-Bay Bourennane ◽

Xuecan Yang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

High Density ◽

Crowd Counting

Download Full-text

Image memorability is predicted by discriminability and similarity in different stages of a convolutional neural network

10.1101/834796 ◽

2019 ◽

Author(s):

Griffin E. Koch ◽

Essang Akpan ◽

Marc N. Coutanche

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Behavioral Experiments ◽

Low Level ◽

Multiple Levels ◽

High Level ◽

Visual Properties ◽

Different Order

AbstractThe features of an image can be represented at multiple levels – from its low-level visual properties to high-level meaning. What drives some images to be memorable while others are forgettable? We address this question across two behavioral experiments. In the first, different layers of a convolutional neural network (CNN), which represent progressively higher levels of features, were used to select the images that would be shown to 100 participants through a form of prospective assignment. Here, the discriminability/similarity of an image with others, according to different CNN layers dictated the images presented to different groups, who made a simple indoor vs. outdoor judgment for each scene. We find that participants remember more scene images that were selected based on their low-level discriminability or high-level similarity. A second experiment replicated these results in an independent sample of fifty participants, with a different order of post-encoding tasks. Together, these experiments provide evidence that both discriminability and similarity, at different visual levels, predict image memorability.

Download Full-text

Multi-Feature Fusion with Convolutional Neural Network for Ship Classification in Optical Images

Applied Sciences ◽

10.3390/app9204209 ◽

2019 ◽

Vol 9 (20) ◽

pp. 4209 ◽

Cited By ~ 2

Author(s):

Yongmei Ren ◽

Jie Yang ◽

Qingnian Zhang ◽

Zhiqiang Guo

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Feature Fusion ◽

Structural Information ◽

Confusion Matrix ◽

Weather Conditions ◽

Local Binary Patterns ◽

Feature Representation ◽

Ship Classification ◽

High Level

The appearance of ships is easily affected by external factors—illumination, weather conditions, and sea state—that make ship classification a challenging task. To facilitate realization of enhanced ship-classification performance, this study proposes a ship classification method based on multi-feature fusion with a convolutional neural network (CNN). First, an improved CNN characterized by shallow layers and few parameters is proposed to learn high-level features and capture structural information. Second, handcrafted features of the histogram of oriented gradients (HOG) and local binary patterns (LBP) are combined with high-level features extracted by the improved CNN in the last fully connected layer to obtain discriminative feature representation. The handcrafted features supplement the edge information and spatial texture information of the ship images. Then, the Softmax function is used to classify different types of ships in the output layer. Effectiveness of the proposed method is evaluated based on its application to two datasets—one self-built and the other publicly available, called visible and infrared spectrums (VAIS). As observed, the proposed method demonstrated attainment of average classification accuracies equal to 97.50% and 93.60%, respectively, when applied to these datasets. Additionally, results obtained in terms of the F1-score and confusion matrix demonstrate the proposed method to be superior to some state-of-the-art methods.

Download Full-text

Where are the People? A Multi-Stream Convolutional Neural Network for Crowd Counting via Density Map from Complex Images

2019 International Conference on Systems, Signals and Image Processing (IWSSIP) ◽

10.1109/iwssip.2019.8787217 ◽

2019 ◽

Cited By ~ 2

Author(s):

Darwin Ttito ◽

Rodolfo Quispe ◽

Adin Ramirez Rivera ◽

Helio Pedrini

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Crowd Counting ◽

The People ◽

Density Map ◽

Complex Images

Download Full-text

Crowd counting via Multi-Scale Adversarial Convolutional Neural Networks

Journal of Intelligent Systems ◽

10.1515/jisys-2019-0157 ◽

2020 ◽

Vol 30 (1) ◽

pp. 180-191

Author(s):

Liping Zhu ◽

Hong Zhang ◽

Sikandar Ali ◽

Baoli Yang ◽

Chengyang Li

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Density Estimation ◽

Large Scale ◽

Receptive Fields ◽

Crowd Counting ◽

Multi Scale ◽

Training Scheme ◽

Joint Training ◽

End To End

Abstract The purpose of crowd counting is to estimate the number of pedestrians in crowd images. Crowd counting or density estimation is an extremely challenging task in computer vision, due to large scale variations and dense scene. Current methods solve these issues by compounding multi-scale Convolutional Neural Network with different receptive fields. In this paper, a novel end-to-end architecture based on Multi-Scale Adversarial Convolutional Neural Network (MSA-CNN) is proposed to generate crowd density and estimate the amount of crowd. Firstly, a multi-scale network is used to extract the globally relevant features in the crowd image, and then fractionally-strided convolutional layers are designed for up-sampling the output to recover the loss of crucial details caused by the earlier max pooling layers. An adversarial loss is directly employed to shrink the estimated value into the realistic subspace to reduce the blurring effect of density estimation. Joint training is performed in an end-to-end fashion using a combination of Adversarial loss and Euclidean loss. The two losses are integrated via a joint training scheme to improve density estimation performance.We conduct some extensive experiments on available datasets to show the significant improvements and supremacy of the proposed approach over the available state-of-the-art approaches.

Download Full-text