scholarly journals The Application of Improved YOLO V3 in Multi-Scale Target Detection

2019 ◽  
Vol 9 (18) ◽  
pp. 3775 ◽  
Author(s):  
Ju ◽  
Luo ◽  
Wang ◽  
Hui ◽  
Chang

Target detection is one of the most important research directions in computer vision. Recently, a variety of target detection algorithms have been proposed. Since the targets have varying sizes in a scene, it is essential to be able to detect the targets at different scales. To improve the detection performance of targets with different sizes, a multi-scale target detection algorithm was proposed involving improved YOLO (You Only Look Once) V3. The main contributions of our work include: (1) a mathematical derivation method based on Intersection over Union (IOU) was proposed to select the number and the aspect ratio dimensions of the candidate anchor boxes for each scale of the improved YOLO V3; (2) To further improve the detection performance of the network, the detection scales of YOLO V3 have been extended from 3 to 4 and the feature fusion target detection layer downsampled by 4× is established to detect the small targets; (3) To avoid gradient fading and enhance the reuse of the features, the six convolutional layers in front of the output detection layer are transformed into two residual units. The experimental results upon PASCAL VOC dataset and KITTI dataset show that the proposed method has obtained better performance than other state-of-the-art target detection algorithms.

2021 ◽  
Vol 2078 (1) ◽  
pp. 012008
Author(s):  
Hui Liu ◽  
Keyang Cheng

Abstract Aiming at the problem of false detection and missed detection of small targets and occluded targets in the process of pedestrian detection, a pedestrian detection algorithm based on improved multi-scale feature fusion is proposed. First, for the YOLOv4 multi-scale feature fusion module PANet, which does not consider the interaction relationship between scales, PANet is improved to reduce the semantic gap between scales, and the attention mechanism is introduced to learn the importance of different layers to strengthen feature fusion; then, dilated convolution is introduced. Dilated convolution reduces the problem of information loss during the downsampling process; finally, the K-means clustering algorithm is used to redesign the anchor box and modify the loss function to detect a single category. The experimental results show that the improved pedestrian detection algorithm in the INRIA and WiderPerson data sets under different congestion conditions, the AP reaches 96.83% and 59.67%, respectively. Compared with the pedestrian detection results of the YOLOv4 model, the algorithm improves by 2.41% and 1.03%, respectively. The problem of false detection and missed detection of small targets and occlusion has been significantly improved.


2021 ◽  
Vol 13 (2) ◽  
pp. 160
Author(s):  
Jiangqiao Yan ◽  
Liangjin Zhao ◽  
Wenhui Diao ◽  
Hongqi Wang ◽  
Xian Sun

As a precursor step for computer vision algorithms, object detection plays an important role in various practical application scenarios. With the objects to be detected becoming more complex, the problem of multi-scale object detection has attracted more and more attention, especially in the field of remote sensing detection. Early convolutional neural network detection algorithms are mostly based on artificially preset anchor-boxes to divide different regions in the image, and then obtain the prior position of the target. However, the anchor box is difficult to set reasonably and will cause a large amount of computational redundancy, which affects the generality of the detection model obtained under fixed parameters. In the past two years, anchor-free detection algorithm has achieved remarkable development in the field of detection on natural image. However, there is no sufficient research on how to deal with multi-scale detection more effectively in anchor-free framework and use these detectors on remote sensing images. In this paper, we propose a specific-attention Feature Pyramid Network (FPN) module, which is able to generate a feature pyramid, basing on the characteristics of objects with various sizes. In addition, this pyramid suits multi-scale object detection better. Besides, a scale-aware detection head is proposed which contains a multi-receptive feature fusion module and a size-based feature compensation module. The new anchor-free detector can obtain a more effective multi-scale feature expression. Experiments on challenging datasets show that our approach performs favorably against other methods in terms of the multi-scale object detection performance.


Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1237 ◽  
Author(s):  
Yuwei Lu ◽  
Lili Dong ◽  
Tong Zhang ◽  
Wenhai Xu

Infrared maritime target detection is the key technology of maritime target search systems. However, infrared images generally have the defects of low signal-to-noise ratio and low resolution. At the same time, the maritime environment is complicated and changeable. Under the interference of islands, waves and other disturbances, the brightness of small dim targets is easily obscured, which makes them difficult to distinguish. This is difficult for traditional target detection algorithms to deal with. In order to solve these problems, through the analysis of infrared maritime images under a variety of sea conditions including small dim targets, this paper concludes that in infrared maritime images, small targets occupy very few pixels, often do not have any edge contour information, and the gray value and contrast values are very low. The background such as island and strong sea wave occupies a large number of pixels, with obvious texture features, and often has a high gray value. By deeply analyzing the difference between the target and the background, this paper proposes a detection algorithm (SRGM) for infrared small dim targets under different maritime background. Firstly, this algorithm proposes an efficient maritime background filter for the common background in the infrared maritime image. Firstly, the median filter based on the sensitive region selection is used to extract the image background accurately, and then the background is eliminated by image difference with the original image. In addition, this article analyzes the differences in gradient features between strong interference caused by the background and targets, proposes a small dim target extraction operator with two analysis factors that fit the target features perfectly and combines the adaptive threshold segmentation to realize the accurate extraction of the small dim target. The experimental results show that compared with the current popular small dim target detection algorithms, this paper has better performance for target detection in various maritime environments.


2018 ◽  
Vol 10 (8) ◽  
pp. 80
Author(s):  
Lei Zhang ◽  
Xiaoli Zhi

Convolutional neural networks (CNN for short) have made great progress in face detection. They mostly take computation intensive networks as the backbone in order to obtain high precision, and they cannot get a good detection speed without the support of high-performance GPUs (Graphics Processing Units). This limits CNN-based face detection algorithms in real applications, especially in some speed dependent ones. To alleviate this problem, we propose a lightweight face detector in this paper, which takes a fast residual network as backbone. Our method can run fast even on cheap and ordinary GPUs. To guarantee its detection precision, multi-scale features and multi-context are fully exploited in efficient ways. Specifically, feature fusion is used to obtain semantic strongly multi-scale features firstly. Then multi-context including both local and global context is added to these multi-scale features without extra computational burden. The local context is added through a depthwise separable convolution based approach, and the global context by a simple global average pooling way. Experimental results show that our method can run at about 110 fps on VGA (Video Graphics Array)-resolution images, while still maintaining competitive precision on WIDER FACE and FDDB (Face Detection Data Set and Benchmark) datasets as compared with its state-of-the-art counterparts.


2013 ◽  
Vol 756-759 ◽  
pp. 3183-3188
Author(s):  
Tao Lei ◽  
Deng Ping He ◽  
Fang Tang Chen

BLAST can achieve high speed data communication. Its signal detection directly affects performance of BLAST receiver. This paper introduced several signal detection algorithmsZF algorithm, MMSE algorithm, ZF-SIC algorithm and MMSE-SIC algorithm. The simulation results show that the traditional ZF algorithm has the worst performance, the traditional MMSE algorithm and the ZF-SIC algorithm is similar, but with the increase of the SNR, the performance of ZF-SIC algorithm is better than MMSE algorithm. MMSE-SIC algorithm has the best detection performance in these detection algorithms.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yongxiang Wu ◽  
Yili Fu ◽  
Shuguo Wang

Purpose This paper aims to use fully convolutional network (FCN) to predict pixel-wise antipodal grasp affordances for unknown objects and improve the grasp detection performance through multi-scale feature fusion. Design/methodology/approach A modified FCN network is used as the backbone to extract pixel-wise features from the input image, which are further fused with multi-scale context information gathered by a three-level pyramid pooling module to make more robust predictions. Based on the proposed unify feature embedding framework, two head networks are designed to implement different grasp rotation prediction strategies (regression and classification), and their performances are evaluated and compared with a defined point metric. The regression network is further extended to predict the grasp rectangles for comparisons with previous methods and real-world robotic grasping of unknown objects. Findings The ablation study of the pyramid pooling module shows that the multi-scale information fusion significantly improves the model performance. The regression approach outperforms the classification approach based on same feature embedding framework on two data sets. The regression network achieves a state-of-the-art accuracy (up to 98.9%) and speed (4 ms per image) and high success rate (97% for household objects, 94.4% for adversarial objects and 95.3% for objects in clutter) in the unknown object grasping experiment. Originality/value A novel pixel-wise grasp affordance prediction network based on multi-scale feature fusion is proposed to improve the grasp detection performance. Two prediction approaches are formulated and compared based on the proposed framework. The proposed method achieves excellent performances on three benchmark data sets and real-world robotic grasping experiment.


Author(s):  
ZHEN-XUE CHEN ◽  
CHENG-YUN LIU ◽  
FA-LIANG CHANG

It is an important and challenging problem to detect small targets in clutter scene and low SNR (Signal Noise Ratio) in infrared (IR) images. In order to solve this problem, a method based on feature salience is proposed for automatic detection of targets in complex background. Firstly, in this paper, the method utilizes the average absolute difference maximum (AADM) as the dissimilarity measurement between targets and background region to enhance targets. Secondly, minimum probability of error was used to build the model of feature salience. Finally, by computing the realistic degree of features, this method solves the problem of multi-feather fusion. Experimental results show that the algorithm proposed shows better performance with respect to the probability of detection. It is an effective and valuable small target detection algorithm under a complex background.


2021 ◽  
Vol 18 (2) ◽  
pp. 499-516
Author(s):  
Yan Sun ◽  
Zheping Yan

The main purpose of target detection is to identify and locate targets from still images or video sequences. It is one of the key tasks in the field of computer vision. With the continuous breakthrough of deep machine learning technology, especially the convolutional neural network model shows strong Ability to extract image feature in the field of digital image processing. Although the model research of target detection based on convolutional neural network is developing rapidly, but there are still some problems in practical applications. For example, a large number of parameters requires high storage and computational costs in detected model. Therefore, this paper optimizes and compresses some algorithms by using early image detection algorithms and image detection algorithms based on convolutional neural networks. After training and learning, there will appear forward propagation mode in the application of CNN network model, providing the model for image feature extraction, integration processing and feature mapping. The use of back propagation makes the CNN network model have the ability to optimize learning and compressed algorithm. Then research discuss the Faster-RCNN algorithm and the YOLO algorithm. Aiming at the problem of the candidate frame is not significant which extracted in the Faster- RCNN algorithm, a target detection model based on the Significant area recommendation network is proposed. The weight of the feature map is calculated by the model, which enhances the saliency of the feature and reduces the background interference. Experiments show that the image detection algorithm based on compressed neural network image has certain feasibility.


2021 ◽  
Vol 13 (18) ◽  
pp. 3650
Author(s):  
Ru Luo ◽  
Jin Xing ◽  
Lifu Chen ◽  
Zhouhao Pan ◽  
Xingmin Cai ◽  
...  

Although deep learning has achieved great success in aircraft detection from SAR imagery, its blackbox behavior has been criticized for low comprehensibility and interpretability. Such challenges have impeded the trustworthiness and wide application of deep learning techniques in SAR image analytics. In this paper, we propose an innovative eXplainable Artificial Intelligence (XAI) framework to glassbox deep neural networks (DNN) by using aircraft detection as a case study. This framework is composed of three parts: hybrid global attribution mapping (HGAM) for backbone network selection, path aggregation network (PANet), and class-specific confidence scores mapping (CCSM) for visualization of the detector. HGAM integrates the local and global XAI techniques to evaluate the effectiveness of DNN feature extraction; PANet provides advanced feature fusion to generate multi-scale prediction feature maps; while CCSM relies on visualization methods to examine the detection performance with given DNN and input SAR images. This framework can select the optimal backbone DNN for aircraft detection and map the detection performance for better understanding of the DNN. We verify its effectiveness with experiments using Gaofen-3 imagery. Our XAI framework offers an explainable approach to design, develop, and deploy DNN for SAR image analytics.


Sign in / Sign up

Export Citation Format

Share Document