scholarly journals Multi-Scale Feature Pyramid Network: A Heavily Occluded Pedestrian Detection Network Based on ResNet

Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1820
Author(s):  
Xiaotao Shao ◽  
Qing Wang ◽  
Wei Yang ◽  
Yun Chen ◽  
Yi Xie ◽  
...  

The existing pedestrian detection algorithms cannot effectively extract features of heavily occluded targets which results in lower detection accuracy. To solve the heavy occlusion in crowds, we propose a multi-scale feature pyramid network based on ResNet (MFPN) to enhance the features of occluded targets and improve the detection accuracy. MFPN includes two modules, namely double feature pyramid network (FPN) integrated with ResNet (DFR) and repulsion loss of minimum (RLM). We propose the double FPN which improves the architecture to further enhance the semantic information and contours of occluded pedestrians, and provide a new way for feature extraction of occluded targets. The features extracted by our network can be more separated and clearer, especially those heavily occluded pedestrians. Repulsion loss is introduced to improve the loss function which can keep predicted boxes away from the ground truths of the unrelated targets. Experiments carried out on the public CrowdHuman dataset, we obtain 90.96% AP which yields the best performance, 5.16% AP gains compared to the FPN-ResNet50 baseline. Compared with the state-of-the-art works, the performance of the pedestrian detection system has been boosted with our method.

Symmetry ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 950
Author(s):  
Hong Liang ◽  
Junlong Yang ◽  
Mingwen Shao

Because small targets have fewer pixels and carry fewer features, most target detection algorithms cannot effectively use the edge information and semantic information of small targets in the feature map, resulting in low detection accuracy, missed detections, and false detections from time to time. To solve the shortcoming of insufficient information features of small targets in the RetinaNet, this work introduces a parallel-assisted multi-scale feature enhancement module MFEM (Multi-scale Feature Enhancement Model), which uses dilated convolution with different expansion rates to avoid multiple down sampling. MFEM avoids information loss caused by multiple down sampling, and at the same time helps to assist shallow extraction of multi-scale context information. Additionally, this work adopts a backbone network improvement plan specifically designed for target detection tasks, which can effectively save small target information in high-level feature maps. The traditional top-down pyramid structure focuses on transferring high-level semantics from the top to the bottom, and the one-way information flow is not conducive to the detection of small targets. In this work, the auxiliary MFEM branch is combined with RetinaNet to construct a model with a bidirectional feature pyramid network, which can effectively integrate the strong semantic information of the high-level network and high-resolution information regarding the low level. The bidirectional feature pyramid network designed in this work is a symmetrical structure, including a top-down branch and a bottom-up branch, performs the transfer and fusion of strong semantic information and strong resolution information. To prove the effectiveness of the algorithm FE-RetinaNet (Feature Enhancement RetinaNet), this work conducts experiments on the MS COCO. Compared with the original RetinaNet, the improved RetinaNet has achieved a 1.8% improvement in the detection accuracy (mAP) on the MS COCO, and the COCO AP is 36.2%; FE-RetinaNet has a good detection effect on small targets, with APs increased by 3.2%.


2020 ◽  
Vol 12 (5) ◽  
pp. 784 ◽  
Author(s):  
Wei Guo ◽  
Weihong Li ◽  
Weiguo Gong ◽  
Jinkai Cui

Multi-scale object detection is a basic challenge in computer vision. Although many advanced methods based on convolutional neural networks have succeeded in natural images, the progress in aerial images has been relatively slow mainly due to the considerably huge scale variations of objects and many densely distributed small objects. In this paper, considering that the semantic information of the small objects may be weakened or even disappear in the deeper layers of neural network, we propose a new detection framework called Extended Feature Pyramid Network (EFPN) for strengthening the information extraction ability of the neural network. In the EFPN, we first design the multi-branched dilated bottleneck (MBDB) module in the lateral connections to capture much more semantic information. Then, we further devise an attention pathway for better locating the objects. Finally, an augmented bottom-up pathway is conducted for making shallow layer information easier to spread and further improving performance. Moreover, we present an adaptive scale training strategy to enable the network to better recognize multi-scale objects. Meanwhile, we present a novel clustering method to achieve adaptive anchors and make the neural network better learn data features. Experiments on the public aerial datasets indicate that the presented method obtain state-of-the-art performance.


2013 ◽  
Vol 347-350 ◽  
pp. 3815-3820
Author(s):  
Li Hong Zhang ◽  
Lin Li

In order to further improve pedestrian detection accuracy and avoid the disadvantage of original histogram of oriented gradients (HOG), differential template, overlap ratio and normalization method and so on are improved when HOG features are extracted, then more gradient information are extracted and feature description operators can be obtained which describe human detail features better in lager image regions or detection windows. Considering speed, we select support vector machine (SVM) using linear function kernel as a classifier. Multi-scale detection technique and non maxima suppression method are employed for precisely locating the pedestrians in the image. Experiments show that the human detection system improves detection accuracy and still maintains a relatively satisfactory speed.


2021 ◽  
Vol 11 (8) ◽  
pp. 3652
Author(s):  
Rao Cheng ◽  
Xiaowei He ◽  
Zhonglong Zheng ◽  
Zhentao Wang

In the practical application scenarios of safety helmet detection, the lightweight algorithm You Only Look Once (YOLO) v3-tiny is easy to be deployed in embedded devices because its number of parameters is small. However, its detection accuracy is relatively low, which is why it is not suitable for detecting multi-scale safety helmets. The safety helmet detection algorithm (named SAS-YOLOv3-tiny) is proposed in this paper to balance detection accuracy and model complexity. A light Sandglass-Residual (SR) module based on depthwise separable convolution and channel attention mechanism is constructed to replace the original convolution layer, and the convolution layer of stride two is used to replace the max-pooling layer for obtaining more informative features and promoting detection performance while reducing the number of parameters and computation. Instead of two-scale feature prediction, three-scale feature prediction is used here to improve the detection effect about small objects further. In addition, an improved spatial pyramid pooling (SPP) module is added to the feature extraction network to extract local and global features with rich semantic information. Complete-Intersection over Union (CIoU) loss is also introduced in this paper to improve the loss function for promoting positioning accuracy. The results on the self-built helmet dataset show that the improved algorithm is superior to the original algorithm. Compared with the original YOLOv3-tiny, the SAS-YOLOv3-tiny has significantly improved all metrics (including Precision (P), Recall (R), Mean Average Precision (mAP), F1) at the expense of only a minor speed while keeping fewer parameters and amounts of calculation. Meanwhile, the SAS-YOLOv3-tiny algorithm shows advantages in accuracy compared with lightweight object detection algorithms, and its speed is faster than the heavyweight model.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1426
Author(s):  
Chuanyang Liu ◽  
Yiquan Wu ◽  
Jingjing Liu ◽  
Jiaming Han

Insulator detection is an essential task for the safety and reliable operation of intelligent grids. Owing to insulator images including various background interferences, most traditional image-processing methods cannot achieve good performance. Some You Only Look Once (YOLO) networks are employed to meet the requirements of actual applications for insulator detection. To achieve a good trade-off among accuracy, running time, and memory storage, this work proposes the modified YOLO-tiny for insulator (MTI-YOLO) network for insulator detection in complex aerial images. First of all, composite insulator images are collected in common scenes and the “CCIN_detection” (Chinese Composite INsulator) dataset is constructed. Secondly, to improve the detection accuracy of different sizes of insulator, multi-scale feature detection headers, a structure of multi-scale feature fusion, and the spatial pyramid pooling (SPP) model are adopted to the MTI-YOLO network. Finally, the proposed MTI-YOLO network and the compared networks are trained and tested on the “CCIN_detection” dataset. The average precision (AP) of our proposed network is 17% and 9% higher than YOLO-tiny and YOLO-v2. Compared with YOLO-tiny and YOLO-v2, the running time of the proposed network is slightly higher. Furthermore, the memory usage of the proposed network is 25.6% and 38.9% lower than YOLO-v2 and YOLO-v3, respectively. Experimental results and analysis validate that the proposed network achieves good performance in both complex backgrounds and bright illumination conditions.


2021 ◽  
Vol 2078 (1) ◽  
pp. 012008
Author(s):  
Hui Liu ◽  
Keyang Cheng

Abstract Aiming at the problem of false detection and missed detection of small targets and occluded targets in the process of pedestrian detection, a pedestrian detection algorithm based on improved multi-scale feature fusion is proposed. First, for the YOLOv4 multi-scale feature fusion module PANet, which does not consider the interaction relationship between scales, PANet is improved to reduce the semantic gap between scales, and the attention mechanism is introduced to learn the importance of different layers to strengthen feature fusion; then, dilated convolution is introduced. Dilated convolution reduces the problem of information loss during the downsampling process; finally, the K-means clustering algorithm is used to redesign the anchor box and modify the loss function to detect a single category. The experimental results show that the improved pedestrian detection algorithm in the INRIA and WiderPerson data sets under different congestion conditions, the AP reaches 96.83% and 59.67%, respectively. Compared with the pedestrian detection results of the YOLOv4 model, the algorithm improves by 2.41% and 1.03%, respectively. The problem of false detection and missed detection of small targets and occlusion has been significantly improved.


2020 ◽  
Vol 24 (16) ◽  
pp. 12671-12680
Author(s):  
Feng Guo ◽  
Canghong Shi ◽  
Xiaojie Li ◽  
Xi Wu ◽  
Jiliu Zhou ◽  
...  

2019 ◽  
Vol 11 (5) ◽  
pp. 531 ◽  
Author(s):  
Yuanyuan Wang ◽  
Chao Wang ◽  
Hong Zhang ◽  
Yingbo Dong ◽  
Sisi Wei

Independent of daylight and weather conditions, synthetic aperture radar (SAR) imagery is widely applied to detect ships in marine surveillance. The shapes of ships are multi-scale in SAR imagery due to multi-resolution imaging modes and their various shapes. Conventional ship detection methods are highly dependent on the statistical models of sea clutter or the extracted features, and their robustness need to be strengthened. Being an automatic learning representation, the RetinaNet object detector, one kind of deep learning model, is proposed to crack this obstacle. Firstly, feature pyramid networks (FPN) are used to extract multi-scale features for both ship classification and location. Then, focal loss is used to address the class imbalance and to increase the importance of the hard examples during training. There are 86 scenes of Chinese Gaofen-3 Imagery at four resolutions, i.e., 3 m, 5 m, 8 m, and 10 m, used to evaluate our approach. Two Gaofen-3 images and one Constellation of Small Satellite for Mediterranean basin Observation (Cosmo-SkyMed) image are used to evaluate the robustness. The experimental results reveal that (1) RetinaNet not only can efficiently detect multi-scale ships but also has a high detection accuracy; (2) compared with other object detectors, RetinaNet achieves more than a 96% mean average precision (mAP). These results demonstrate the effectiveness of our proposed method.


Sign in / Sign up

Export Citation Format

Share Document