A Parallel Convolutional Neural Network for Pedestrian Detection

Mengya Zhu; Yiquan Wu

doi:10.3390/electronics9091478

A Parallel Convolutional Neural Network for Pedestrian Detection

Electronics ◽

10.3390/electronics9091478 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1478

Author(s):

Mengya Zhu ◽

Yiquan Wu

Keyword(s):

Pedestrian Detection ◽

Autonomous Driving ◽

Feature Representation ◽

Model Parameters ◽

Detection Accuracy ◽

Semantic Features ◽

Practical Application ◽

Lightweight Framework ◽

Human Activity Analysis ◽

High Level

Pedestrian detection is a crucial task in many vision-based applications, such as video surveillance, human activity analysis and autonomous driving. Recently, most of the existing pedestrian detection frameworks only focus on the detection accuracy or model parameters. However, how to balance the detection accuracy and model parameters, is still an open problem for the practical application of pedestrian detection. In this paper, we propose a parallel, lightweight framework for pedestrian detection, named ParallelNet. ParallelNet consists of four branches, each of them learns different high-level semantic features. We fused them into one feature map as the final feature representation. Subsequently, the Fire module, which includes Squeeze and Expand parts, is employed for reducing the model parameters. Here, we replace some convolution modules in the backbone with Fire modules. Finally, the focal loss is led into the ParallelNet for end-to-end training. Experimental results on the Caltech–Zhang dataset and KITTI dataset show that: Compared with the single-branch network, such as ResNet and SqueezeNet, ParallelNet has improved detection accuracy with fewer model parameters and lower Giga Floating Point Operations (GFLOPs).

Download Full-text

Vehicle and Pedestrian Detection Based on Multi-level Feature Fusion in Autonomous Driving

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200304123323 ◽

2020 ◽

Vol 13 ◽

Author(s):

Chen Guoqiang ◽

Yi Huailong ◽

Mao Zhuangzhuang

Keyword(s):

Autonomous Vehicles ◽

Feature Fusion ◽

Pedestrian Detection ◽

Autonomous Driving ◽

Seasonal Effects ◽

Detection Accuracy ◽

Semantic Features ◽

Feature Maps ◽

Safe Driving ◽

Multi Level

Aims: The factors including light, weather, dynamic objects, seasonal effects and structures bring great challenges for the autonomous driving algorithm in the real world. Autonomous vehicles can detect different object obstacles in complex scenes to ensure safe driving. Background: The ability to detect vehicles and pedestrians is critical to the safe driving of autonomous vehicles. Automated vehicle vision systems must handle extremely wide and challenging scenarios. Objective: The goal of the work is to design a robust detector to detect vehicles and pedestrians. The main contribution is that the Multi-level Feature Fusion Block (MFFB) and the Detector Cascade Block (DCB) are designed. The multi-level feature fusion and multi-step prediction are used which greatly improve the detection object precision. Methods: The paper proposes a vehicle and pedestrian object detector, which is an end-to-end deep convolutional neural network. The key parts of the paper are to design the Multi-level Feature Fusion Block (MFFB) and Detector Cascade Block (DCB). The former combines inherent multi-level features by combining contextual information with useful multi-level features that combine high resolution but low semantics and low resolution but high semantic features. The latter uses multi-step prediction, cascades a series of detectors, and combines predictions of multiple feature maps to handle objects of different sizes. Results: The experiments on the RobotCar dataset and the KITTI dataset show that our algorithm can achieve high precision results through real-time detection. The algorithm achieves 84.61% mAP on the RobotCar dataset and is evaluated on the well-known KITTI benchmark dataset, achieving 81.54% mAP. In particular, the detection accuracy of a single-category vehicle reaches 90.02%. Conclusion: The experimental results show that the proposed algorithm has a good trade-off between detection accuracy and detection speed, which is beyond the current state-of-the-art RefineDet algorithm. The 2D object detector is proposed in the paper, which can solve the problem of vehicle and pedestrian detection and improve the accuracy, robustness and generalization ability in autonomous driving.

Download Full-text

The use of remote sensing satellite using deep learning in emergency monitoring of high-level landslides disaster in Jinsha River

The Journal of Supercomputing ◽

10.1007/s11227-020-03604-4 ◽

2021 ◽

Author(s):

Leijin Long ◽

Feng He ◽

Hongjiang Liu

Keyword(s):

Remote Sensing ◽

Southwest China ◽

Influence Factors ◽

Classification Error ◽

Model Parameters ◽

Detection Accuracy ◽

Remote Sensing Images ◽

Jinsha River ◽

Detection Model ◽

High Level

AbstractIn order to monitor the high-level landslides frequently occurring in Jinsha River area of Southwest China, and protect the lives and property safety of people in mountainous areas, the data of satellite remote sensing images are combined with various factors inducing landslides and transformed into landslide influence factors, which provides data basis for the establishment of landslide detection model. Then, based on the deep belief networks (DBN) and convolutional neural network (CNN) algorithm, two landslide detection models DBN and convolutional neural-deep belief network (CDN) are established to monitor the high-level landslide in Jinsha River. The influence of the model parameters on the landslide detection results is analyzed, and the accuracy of DBN and CDN models in dealing with actual landslide problems is compared. The results show that when the number of neurons in the DBN is 100, the overall error is the minimum, and when the number of learning layers is 3, the classification error is the minimum. The detection accuracy of DBN and CDN is 97.56% and 97.63%, respectively, which indicates that both DBN and CDN models are feasible in dealing with landslides from remote sensing images. This exploration provides a reference for the study of high-level landslide disasters in Jinsha River.

Download Full-text

Research on Lightweight Infrared Pedestrian Detection Model Algorithm for Embedded Platform

Security and Communication Networks ◽

10.1155/2021/1549772 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Zhaoli Wu ◽

Xin Wang ◽

Chao Chen

Keyword(s):

Real Time ◽

Target Detection ◽

Pedestrian Detection ◽

Infrared Image ◽

Far Infrared ◽

Detection Algorithm ◽

Model Parameters ◽

Detection Accuracy ◽

Detection Model ◽

Embedded Platform

Due to the limitation of energy consumption and power consumption, the embedded platform cannot meet the real-time requirements of the far-infrared image pedestrian detection algorithm. To solve this problem, this paper proposes a new real-time infrared pedestrian detection algorithm (RepVGG-YOLOv4, Rep-YOLO), which uses RepVGG to reconstruct the YOLOv4 backbone network, reduces the amount of model parameters and calculations, and improves the speed of target detection; using space spatial pyramid pooling (SPP) obtains different receptive field information to improve the accuracy of model detection; using the channel pruning compression method reduces redundant parameters, model size, and computational complexity. The experimental results show that compared with the YOLOv4 target detection algorithm, the Rep-YOLO algorithm reduces the model volume by 90%, the floating-point calculation is reduced by 93.4%, the reasoning speed is increased by 4 times, and the model detection accuracy after compression reaches 93.25%.

Download Full-text

Object Detection Based on Region Decomposition and Assembly

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018094 ◽

2019 ◽

Vol 33 ◽

pp. 8094-8101 ◽

Cited By ~ 4

Author(s):

Seung-Hwan Bae

Keyword(s):

Neural Networks ◽

Object Detection ◽

Performance Improvement ◽

Semantic Relations ◽

Detection Accuracy ◽

Semantic Features ◽

Multi Scale ◽

Object Proposals ◽

Object Region ◽

High Level

Region-based object detection infers object regions for one or more categories in an image. Due to the recent advances in deep learning and region proposal methods, object detectors based on convolutional neural networks (CNNs) have been flourishing and provided the promising detection results. However, the detection accuracy is degraded often because of the low discriminability of object CNN features caused by occlusions and inaccurate region proposals. In this paper, we therefore propose a region decomposition and assembly detector (R-DAD) for more accurate object detection.In the proposed R-DAD, we first decompose an object region into multiple small regions. To capture an entire appearance and part details of the object jointly, we extract CNN features within the whole object region and decomposed regions. We then learn the semantic relations between the object and its parts by combining the multi-region features stage by stage with region assembly blocks, and use the combined and high-level semantic features for the object classification and localization. In addition, for more accurate region proposals, we propose a multi-scale proposal layer that can generate object proposals of various scales. We integrate the R-DAD into several feature extractors, and prove the distinct performance improvement on PASCAL07/12 and MSCOCO18 compared to the recent convolutional detectors.

Download Full-text

Hybrid Attention Network for Language-Based Person Search

Sensors ◽

10.3390/s20185279 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5279

Author(s):

Yang Li ◽

Huahu Xu ◽

Junsheng Xiao

Keyword(s):

Image Features ◽

Attention Mechanism ◽

Feature Representation ◽

Semantic Features ◽

Retrieval Task ◽

Attention Network ◽

Fine Grained ◽

Person Search ◽

High Level ◽

Language Description

Language-based person search retrieves images of a target person using natural language description and is a challenging fine-grained cross-modal retrieval task. A novel hybrid attention network is proposed for the task. The network includes the following three aspects: First, a cubic attention mechanism for person image, which combines cross-layer spatial attention and channel attention. It can fully excavate both important midlevel details and key high-level semantics to obtain better discriminative fine-grained feature representation of a person image. Second, a text attention network for language description, which is based on bidirectional LSTM (BiLSTM) and self-attention mechanism. It can better learn the bidirectional semantic dependency and capture the key words of sentences, so as to extract the context information and key semantic features of the language description more effectively and accurately. Third, a cross-modal attention mechanism and a joint loss function for cross-modal learning, which can pay more attention to the relevant parts between text and image features. It can better exploit both the cross-modal and intra-modal correlation and can better solve the problem of cross-modal heterogeneity. Extensive experiments have been conducted on the CUHK-PEDES dataset. Our approach obtains higher performance than state-of-the-art approaches, demonstrating the advantage of the approach we propose.

Download Full-text

Research on Multi-Channel Semantic Fusion Classification Model

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p1044 ◽

2019 ◽

Vol 23 (6) ◽

pp. 1044-1051

Author(s):

Di Yang ◽

◽

Ningjia Qiu ◽

Lin Cong ◽

Huamin Yang

Keyword(s):

Adaptive Learning ◽

Sentiment Classification ◽

Classification Model ◽

Classification Task ◽

Model Parameters ◽

Semantic Features ◽

Gradient Descent Algorithm ◽

Text Word ◽

High Level ◽

Rate Gradient

In this work, we propose a multi-channel semantic fusion convolutional neural network (SFCNN) to solve the problem of emotional ambiguity caused by the change of contextual order in sentiment classification task. Firstly, the emotional tendency weights are evaluated on the text word vector through the improved emotional tendency attention mechanism. Secondly, the multi-channel semantic fusion layer is leveraged to combine deep semantic fusion of sentences with contextual order to generate deep semantic vectors, which are learned by CNN to extract high-level semantic features. Finally, the improved adaptive learning rate gradient descent algorithm is employed to optimize the model parameters, and completes the sentiment classification task. Three datasets are used to evaluate the effectiveness of the proposed algorithm. The experimental results show that the SFCNN model has the high steady-state precision and generalization performance.

Download Full-text

A Real-Time Object Detector for Autonomous Vehicles Based on YOLOv4

Computational Intelligence and Neuroscience ◽

10.1155/2021/9218137 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Rui Wang ◽

Ziyue Wang ◽

Zhengwei Xu ◽

Chi Wang ◽

Qiang Li ◽

...

Keyword(s):

Object Detection ◽

Real Time ◽

High Speed ◽

Feature Fusion ◽

Autonomous Driving ◽

Detection Algorithm ◽

Model Parameters ◽

Detection Accuracy ◽

Time Operation ◽

On The Road

Object detection is an important part of autonomous driving technology. To ensure the safe running of vehicles at high speed, real-time and accurate detection of all the objects on the road is required. How to balance the speed and accuracy of detection is a hot research topic in recent years. This paper puts forward a one-stage object detection algorithm based on YOLOv4, which improves the detection accuracy and supports real-time operation. The backbone of the algorithm doubles the stacking times of the last residual block of CSPDarkNet53. The neck of the algorithm replaces the SPP with the RFB structure, improves the PAN structure of the feature fusion module, adds the attention mechanism CBAM and CA structure to the backbone and neck structure, and finally reduces the overall width of the network to the original 3/4, so as to reduce the model parameters and improve the inference speed. Compared with YOLOv4, the algorithm in this paper improves the average accuracy on KITTI dataset by 2.06% and BDD dataset by 2.95%. When the detection accuracy is almost unchanged, the inference speed of this algorithm is increased by 9.14%, and it can detect in real time at a speed of more than 58.47 FPS.

Download Full-text

Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning

Neural Computation ◽

10.1162/neco_a_01232 ◽

2019 ◽

Vol 31 (11) ◽

pp. 2266-2291 ◽

Cited By ~ 4

Author(s):

Xin Yao ◽

Tianchi Huang ◽

Chenglei Wu ◽

Rui-Xiao Zhang ◽

Lifeng Sun

Keyword(s):

Lifelong Learning ◽

Historical Memory ◽

Fine Tuning ◽

Model Parameters ◽

Semantic Features ◽

Task Sequence ◽

Knowledge Distillation ◽

And Performance ◽

High Level ◽

Feature Alignment

Humans are able to master a variety of knowledge and skills with ongoing learning. By contrast, dramatic performance degradation is observed when new tasks are added to an existing neural network model. This phenomenon, termed catastrophic forgetting, is one of the major roadblocks that prevent deep neural networks from achieving human-level artificial intelligence. Several research efforts (e.g., lifelong or continual learning algorithms) have proposed to tackle this problem. However, they either suffer from an accumulating drop in performance as the task sequence grows longer, or require storing an excessive number of model parameters for historical memory, or cannot obtain competitive performance on the new tasks. In this letter, we focus on the incremental multitask image classification scenario. Inspired by the learning process of students, who usually decompose complex tasks into easier goals, we propose an adversarial feature alignment method to avoid catastrophic forgetting. In our design, both the low-level visual features and high-level semantic features serve as soft targets and guide the training process in multiple stages, which provide sufficient supervised information of the old tasks and help to reduce forgetting. Due to the knowledge distillation and regularization phenomena, the proposed method gains even better performance than fine-tuning on the new tasks, which makes it stand out from other methods. Extensive experiments in several typical lifelong learning scenarios demonstrate that our method outperforms the state-of-the-art methods in both accuracy on new tasks and performance preservation on old tasks.

Download Full-text

Earthquake-Damaged Buildings Detection in Very High-Resolution Remote Sensing Images Based on Object Context and Boundary Enhanced Loss

Remote Sensing ◽

10.3390/rs13163119 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3119

Author(s):

Chao Wang ◽

Xing Qiu ◽

Hai Huan ◽

Shuai Wang ◽

Yan Zhang ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Representation ◽

Detection Accuracy ◽

Remote Sensing Images ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Benchmark Datasets ◽

High Level ◽

Very High

Fully convolutional networks (FCN) such as UNet and DeepLabv3+ are highly competitive when being applied in the detection of earthquake-damaged buildings in very high-resolution (VHR) remote sensing images. However, existing methods show some drawbacks, including incomplete extraction of different sizes of buildings and inaccurate boundary prediction. It is attributed to a deficiency in the global context-aware and inaccurate correlation mining in the spatial context as well as failure to consider the relative positional relationship between pixels and boundaries. Hence, a detection method for earthquake-damaged buildings based on the object contextual representations (OCR) and boundary enhanced loss (BE loss) was proposed. At first, the OCR module was separately embedded into high-level feature extractions of the two networks DeepLabv3+ and UNet in order to enhance the feature representation; in addition, a novel loss function, that is, BE loss, was designed according to the distance between the pixels and boundaries to force the networks to pay more attention to the learning of the boundary pixels. Finally, two improved networks (including OB-DeepLabv3+ and OB-UNet) were established according to the two strategies. To verify the performance of the proposed method, two benchmark datasets (including YSH and HTI) for detecting earthquake-damaged buildings were constructed according to the post-earthquake images in China and Haiti in 2010, respectively. The experimental results show that both the embedment of the OCR module and application of BE loss contribute to significantly increasing the detection accuracy of earthquake-damaged buildings and the two proposed networks are feasible and effective.

Download Full-text

Cascaded Cross-Layer Fusion Network for Pedestrian Detection

Mathematics ◽

10.3390/math10010139 ◽

2022 ◽

Vol 10 (1) ◽

pp. 139

Author(s):

Zhifeng Ding ◽

Zichen Gu ◽

Yanpeng Sun ◽

Xinguang Xiang

Keyword(s):

Feature Fusion ◽

Pedestrian Detection ◽

Detection Performance ◽

Feature Representation ◽

Cross Layer ◽

Feature Maps ◽

Level Information ◽

High Level ◽

Center Map ◽

The Impact

The detection method based on anchor-free not only reduces the training cost of object detection, but also avoids the imbalance problem caused by an excessive number of anchors. However, these methods only pay attention to the impact of the detection head on the detection performance, thus ignoring the impact of feature fusion on the detection performance. In this article, we take pedestrian detection as an example and propose a one-stage network Cascaded Cross-layer Fusion Network (CCFNet) based on anchor-free. It consists of Cascaded Cross-layer Fusion module (CCF) and novel detection head. Among them, CCF fully considers the distribution of high-level information and low-level information of feature maps under different stages in the network. First, the deep network is used to remove a large amount of noise in the shallow features, and finally, the high-level features are reused to obtain a more complete feature representation. Secondly, for the pedestrian detection task, a novel detection head is designed, which uses the global smooth map (GSMap) to provide global information for the center map to obtain a more accurate center map. Finally, we verified the feasibility of CCFNet on the Caltech and CityPersons datasets.

Download Full-text