Using an Improved YOLOv4 Deep Learning Network for Accurate Detection of Whitefly and Thrips on Sticky Trap Images

2021 ◽  
Vol 64 (3) ◽  
pp. 919-927
Author(s):  
Dujin Wang ◽  
Yizhong Wang ◽  
Ming Li ◽  
Xinting Yang ◽  
Jianwei Wu ◽  
...  

HighlightsThe proposed method detected thrips and whitefly more accurately than previous methods.The proposed method demonstrated good robustness to illumination reflections and different pest densities.Small pest detection was improved by adding large-scale feature maps and more residual units to a shallow network.Machine vision and deep learning created an end-to-end model to detect small pests on sticky traps in field conditions.Abstract. Pest detection is the basis of precise control in vegetable greenhouses. To improve the detection accuracy and robustness for two common small pests (whitefly and thrips) in greenhouses, this study proposes a novel small object detection approach based on the YOLOv4 model. Yellow sticky trap (YST) images at the original resolution (2560 × 1920 pixels) were collected using pest monitoring equipment in a greenhouse. The images were then cropped and labeled to create sub-images (416 × 416 pixels) to construct an experimental dataset. The labeled images used in this study (900 training, 100 validation, and 200 test) are available for comparative studies. To enhance the model’s ability to detect small pests, the feature map at the 8-fold downsampling layer in the backbone network was merged with the feature map at the 4-fold downsampling layer to generate a new layer and output a feature map with a size of 104 × 104 pixels. Furthermore, the residual units in the first two residual blocks were enlarged by four times to extract more shallow image features and the location information of target pests to withstand image degradation in the field. The experimental results showed that the mean average precision (mAP) for detection of whitefly and thrips using the proposed approach was improved by 8.2% and 3.4% compared with the YOLOv3 and YOLOv4 models, respectively. The detection performance slightly decreased as the pest densities increased in the YST image, but the mAP value was still 92.7% in the high-density dataset, which indicates that the proposed model has good robustness over a range of pest densities. Compared with previous similar studies, the proposed method has better potential to monitor whitefly and thrips using YSTs in field conditions. Keywords: Deep learning, Greenhouse pest management, Image processing, Pest detection, Small object, YOLOv4.

Author(s):  
Seokyong Shin ◽  
Hyunho Han ◽  
Sang Hun Lee

YOLOv3 is a deep learning-based real-time object detector and is mainly used in applications such as video surveillance and autonomous vehicles. In this paper, we proposed an improved YOLOv3 (You Only Look Once version 3) applied Duplex FPN, which enhanced large object detection by utilizing low-level feature information. The conventional YOLOv3 improved the small object detection performance by applying FPN (Feature Pyramid Networks) structure to YOLOv2. However, YOLOv3 with an FPN structure specialized in detecting small objects, so it is difficult to detect large objects. Therefore, this paper proposed an improved YOLOv3 applied Duplex FPN, which can utilize low-level location information in high-level feature maps instead of the existing FPN structure of YOLOv3. This improved the detection accuracy of large objects. Also, an extra detection layer was added to the top-level feature map to prevent failure of detection of parts of large objects. Further, dimension clusters of each detection layer were reassigned to learn quickly how to accurately detect objects. The proposed method was compared and analyzed in the PASCAL VOC dataset. The experimental results showed that the bounding box accuracy of large objects improved owing to the Duplex FPN and extra detection layer, and the proposed method succeeded in detecting large objects that the existing YOLOv3 did not.


Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5312
Author(s):  
Yanni Zhang ◽  
Yiming Liu ◽  
Qiang Li ◽  
Jianzhong Wang ◽  
Miao Qi ◽  
...  

Recently, deep learning-based image deblurring and deraining have been well developed. However, most of these methods fail to distill the useful features. What is more, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from a high computational burden. We propose a lightweight fusion distillation network (LFDN) for image deblurring and deraining to solve the above problems. The proposed LFDN is designed as an encoder–decoder architecture. In the encoding stage, the image feature is reduced to various small-scale spaces for multi-scale information extraction and fusion without much information loss. Then, a feature distillation normalization block is designed at the beginning of the decoding stage, which enables the network to distill and screen valuable channel information of feature maps continuously. Besides, an information fusion strategy between distillation modules and feature channels is also carried out by the attention mechanism. By fusing different information in the proposed approach, our network can achieve state-of-the-art image deblurring and deraining results with a smaller number of parameters and outperform the existing methods in model complexity.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Maiki Higa ◽  
Shinya Tanahara ◽  
Yoshitaka Adachi ◽  
Natsumi Ishiki ◽  
Shin Nakama ◽  
...  

AbstractIn this report, we propose a deep learning technique for high-accuracy estimation of the intensity class of a typhoon from a single satellite image, by incorporating meteorological domain knowledge. By using the Visual Geometric Group’s model, VGG-16, with images preprocessed with fisheye distortion, which enhances a typhoon’s eye, eyewall, and cloud distribution, we achieved much higher classification accuracy than that of a previous study, even with sequential-split validation. Through comparison of t-distributed stochastic neighbor embedding (t-SNE) plots for the feature maps of VGG with the original satellite images, we also verified that the fisheye preprocessing facilitated cluster formation, suggesting that our model could successfully extract image features related to the typhoon intensity class. Moreover, gradient-weighted class activation mapping (Grad-CAM) was applied to highlight the eye and the cloud distributions surrounding the eye, which are important regions for intensity classification; the results suggest that our model qualitatively gained a viewpoint similar to that of domain experts. A series of analyses revealed that the data-driven approach using only deep learning has limitations, and the integration of domain knowledge could bring new breakthroughs.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Bangtong Huang ◽  
Hongquan Zhang ◽  
Zihong Chen ◽  
Lingling Li ◽  
Lihua Shi

Deep learning algorithms are facing the limitation in virtual reality application due to the cost of memory, computation, and real-time computation problem. Models with rigorous performance might suffer from enormous parameters and large-scale structure, and it would be hard to replant them onto embedded devices. In this paper, with the inspiration of GhostNet, we proposed an efficient structure ShuffleGhost to make use of the redundancy in feature maps to alleviate the cost of computations, as well as tackling some drawbacks of GhostNet. Since GhostNet suffers from high computation of convolution in Ghost module and shortcut, the restriction of downsampling would make it more difficult to apply Ghost module and Ghost bottleneck to other backbone. This paper proposes three new kinds of ShuffleGhost structure to tackle the drawbacks of GhostNet. The ShuffleGhost module and ShuffleGhost bottlenecks are utilized by the shuffle layer and group convolution from ShuffleNet, and they are designed to redistribute the feature maps concatenated from Ghost Feature Map and Primary Feature Map. Besides, they eliminate the gap of them and extract the features. Then, SENet layer is adopted to reduce the computation cost of group convolution, as well as evaluating the importance of the feature maps which concatenated from Ghost Feature Maps and Primary Feature Maps and giving proper weights for the feature maps. This paper conducted some experiments and proved that the ShuffleGhostV3 has smaller trainable parameters and FLOPs with the ensurance of accuracy. And with proper design, it could be more efficient in both GPU and CPU side.


2020 ◽  
Author(s):  
Zongfeng Yang ◽  
Shang Gao ◽  
Feng Xiao ◽  
Ganghua Li ◽  
Yangfeng Ding ◽  
...  

Abstract Background : Identification and characterization of new traits with a sound physiological foundation is essential for crop breeding and management. Deep learning has been widely used in image data analysis to explore spatial and temporal information on crop growth and development, thus strengthening the power of the identification of physiological traits. This study aims to develop a novel trait that indicates source and sink relation in japonica rice based on deep learning.Results : We applied a deep learning approach to accurately segment leaf and panicle and subsequently developed the procedure of GvCrop to calculate the leaf to panicle ratio (LPR) of rice populations during grain filling. Images of the training dataset were captured in the field experiments, with large variations in camera shooting angle, the elevation angle and the azimuth angle of the sun, rice genotype, and plant phenological stages. Accurately labeled by manually annotating all the panicle and leaf regions, the resulting dataset were used to train FPN-Mask (Feature Pyramid Network Mask) models, consisting of a backbone network and a task-specific sub-network. The model with the highest accuracy is then selected to study the variations in LPR among 192 rice germplasms and among agronomical practices. Despite the challenging field conditions, FPN-Mask models achieved a high detection accuracy, with Pixel Accuracy being 0.99 for panicles and 0.98 for leaves. The calculated LPRs showed large spatial and temporal variations as well as genotypic differences.Conclusion : Deep learning techniques can achieve high accuracy in simultaneously detecting panicle and leaf data from complex rice field images. The proposed FPN-Mask model is applicable for detecting and quantifying crop performance under field conditions. The newly identified trait of LPR should provide a high throughput protocol for breeders to select superior rice cultivars as well as for agronomists to precisely manage field crops that have a good balance of source and sink.


2020 ◽  
Vol 2020 ◽  
pp. 1-18 ◽  
Author(s):  
Nhat-Duy Nguyen ◽  
Tien Do ◽  
Thanh Duc Ngo ◽  
Duy-Dinh Le

Small object detection is an interesting topic in computer vision. With the rapid development in deep learning, it has drawn attention of several researchers with innovations in approaches to join a race. These innovations proposed comprise region proposals, divided grid cell, multiscale feature maps, and new loss function. As a result, performance of object detection has recently had significant improvements. However, most of the state-of-the-art detectors, both in one-stage and two-stage approaches, have struggled with detecting small objects. In this study, we evaluate current state-of-the-art models based on deep learning in both approaches such as Fast RCNN, Faster RCNN, RetinaNet, and YOLOv3. We provide a profound assessment of the advantages and limitations of models. Specifically, we run models with different backbones on different datasets with multiscale objects to find out what types of objects are suitable for each model along with backbones. Extensive empirical evaluation was conducted on 2 standard datasets, namely, a small object dataset and a filtered dataset from PASCAL VOC 2007. Finally, comparative results and analyses are then presented.


Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 1089 ◽  
Author(s):  
Ye Wang ◽  
Zhenyi Liu ◽  
Weiwen Deng

Region proposal network (RPN) based object detection, such as Faster Regions with CNN (Faster R-CNN), has gained considerable attention due to its high accuracy and fast speed. However, it has room for improvements when used in special application situations, such as the on-board vehicle detection. Original RPN locates multiscale anchors uniformly on each pixel of the last feature map and classifies whether an anchor is part of the foreground or background with one pixel in the last feature map. The receptive field of each pixel in the last feature map is fixed in the original faster R-CNN and does not coincide with the anchor size. Hence, only a certain part can be seen for large vehicles and too much useless information is contained in the feature for small vehicles. This reduces detection accuracy. Furthermore, the perspective projection results in the vehicle bounding box size becoming related to the bounding box position, thereby reducing the effectiveness and accuracy of the uniform anchor generation method. This reduces both detection accuracy and computing speed. After the region proposal stage, many regions of interest (ROI) are generated. The ROI pooling layer projects an ROI to the last feature map and forms a new feature map with a fixed size for final classification and box regression. The number of feature map pixels in the projected region can also influence the detection performance but this is not accurately controlled in former works. In this paper, the original faster R-CNN is optimized, especially for the on-board vehicle detection. This paper tries to solve these above-mentioned problems. The proposed method is tested on the KITTI dataset and the result shows a significant improvement without too many tricky parameter adjustments and training skills. The proposed method can also be used on other objects with obvious foreshortening effects, such as on-board pedestrian detection. The basic idea of the proposed method does not rely on concrete implementation and thus, most deep learning based object detectors with multiscale feature maps can be optimized with it.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3031
Author(s):  
Jing Lian ◽  
Yuhang Yin ◽  
Linhui Li ◽  
Zhenghao Wang ◽  
Yafu Zhou

There are many small objects in traffic scenes, but due to their low resolution and limited information, their detection is still a challenge. Small object detection is very important for the understanding of traffic scene environments. To improve the detection accuracy of small objects in traffic scenes, we propose a small object detection method in traffic scenes based on attention feature fusion. First, a multi-scale channel attention block (MS-CAB) is designed, which uses local and global scales to aggregate the effective information of the feature maps. Based on this block, an attention feature fusion block (AFFB) is proposed, which can better integrate contextual information from different layers. Finally, the AFFB is used to replace the linear fusion module in the object detection network and obtain the final network structure. The experimental results show that, compared to the benchmark model YOLOv5s, this method has achieved a higher mean Average Precison (mAP) under the premise of ensuring real-time performance. It increases the mAP of all objects by 0.9 percentage points on the validation set of the traffic scene dataset BDD100K, and at the same time, increases the mAP of small objects by 3.5%.


2021 ◽  
Vol 13 (6) ◽  
pp. 1198
Author(s):  
Bi-Yuan Liu ◽  
Huai-Xin Chen ◽  
Zhou Huang ◽  
Xing Liu ◽  
Yun-Zhi Yang

Drone-based object detection has been widely applied in ground object surveillance, urban patrol, and some other fields. However, the dramatic scale changes and complex backgrounds of drone images usually result in weak feature representation of small objects, which makes it challenging to achieve high-precision object detection. Aiming to improve small objects detection, this paper proposes a novel cross-scale knowledge distillation (CSKD) method, which enhances the features of small objects in a manner similar to image enlargement, so it is termed as ZoomInNet. First, based on an efficient feature pyramid network structure, the teacher and student network are trained with images in different scales to introduce the cross-scale feature. Then, the proposed layer adaption (LA) and feature level alignment (FA) mechanisms are applied to align the feature size of the two models. After that, the adaptive key distillation point (AKDP) algorithm is used to get the crucial positions in feature maps that need knowledge distillation. Finally, the position-aware L2 loss is used to measure the difference between feature maps from cross-scale models, realizing the cross-scale information compression in a single model. Experiments on the challenging Visdrone2018 dataset show that the proposed method draws on the advantages of the image pyramid methods, while avoids the large calculation of them and significantly improves the detection accuracy of small objects. Simultaneously, the comparison with mainstream methods proves that our method has the best performance in small object detection.


2018 ◽  
Vol 2018 ◽  
pp. 1-6 ◽  
Author(s):  
Gen-Min Lin ◽  
Mei-Juan Chen ◽  
Chia-Hung Yeh ◽  
Yu-Yang Lin ◽  
Heng-Yu Kuo ◽  
...  

Entropy images, representing the complexity of original fundus photographs, may strengthen the contrast between diabetic retinopathy (DR) lesions and unaffected areas. The aim of this study is to compare the detection performance for severe DR between original fundus photographs and entropy images by deep learning. A sample of 21,123 interpretable fundus photographs obtained from a publicly available data set was expanded to 33,000 images by rotating and flipping. All photographs were transformed into entropy images using block size 9 and downsized to a standard resolution of 100 × 100 pixels. The stages of DR are classified into 5 grades based on the International Clinical Diabetic Retinopathy Disease Severity Scale: Grade 0 (no DR), Grade 1 (mild nonproliferative DR), Grade 2 (moderate nonproliferative DR), Grade 3 (severe nonproliferative DR), and Grade 4 (proliferative DR). Of these 33,000 photographs, 30,000 images were randomly selected as the training set, and the remaining 3,000 images were used as the testing set. Both the original fundus photographs and the entropy images were used as the inputs of convolutional neural network (CNN), and the results of detecting referable DR (Grades 2–4) as the outputs from the two data sets were compared. The detection accuracy, sensitivity, and specificity of using the original fundus photographs data set were 81.80%, 68.36%, 89.87%, respectively, for the entropy images data set, and the figures significantly increased to 86.10%, 73.24%, and 93.81%, respectively (all p values <0.001). The entropy image quantifies the amount of information in the fundus photograph and efficiently accelerates the generating of feature maps in the CNN. The research results draw the conclusion that transformed entropy imaging of fundus photographs can increase the machinery detection accuracy, sensitivity, and specificity of referable DR for the deep learning-based system.


Sign in / Sign up

Export Citation Format

Share Document