Object Detection through Modified YOLO Neural Network

Scientific Programming ◽

10.1155/2020/8403262 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Tanvir Ahmad ◽

Yinglong Ma ◽

Muhammad Yahya ◽

Belal Ahmad ◽

Shah Nazir ◽

...

Keyword(s):

Neural Network ◽

Object Detection ◽

Loss Function ◽

Convolution Kernel ◽

Human Beings ◽

Multiple Objects ◽

Fast Speed ◽

Spatial Pyramid Pooling ◽

Improved Model ◽

Spatial Pyramid

In the field of object detection, recently, tremendous success is achieved, but still it is a very challenging task to detect and identify objects accurately with fast speed. Human beings can detect and recognize multiple objects in images or videos with ease regardless of the object’s appearance, but for computers it is challenging to identify and distinguish between things. In this paper, a modified YOLOv1 based neural network is proposed for object detection. The new neural network model has been improved in the following ways. Firstly, modification is made to the loss function of the YOLOv1 network. The improved model replaces the margin style with proportion style. Compared to the old loss function, the new is more flexible and more reasonable in optimizing the network error. Secondly, a spatial pyramid pooling layer is added; thirdly, an inception model with a convolution kernel of 1 ∗ 1 is added, which reduced the number of weight parameters of the layers. Extensive experiments on Pascal VOC datasets 2007/2012 showed that the proposed method achieved better performance.

Download Full-text

Deep Image Similarity Measurement Based on the Improved Triplet Network with Spatial Pyramid Pooling

Information ◽

10.3390/info10040129 ◽

2019 ◽

Vol 10 (4) ◽

pp. 129 ◽

Cited By ~ 4

Author(s):

Xinpan Yuan ◽

Qunfeng Liu ◽

Jun Long ◽

Lei Hu ◽

Yulou Wang

Keyword(s):

Neural Network ◽

Loss Function ◽

Fundamental Problem ◽

Feature Space ◽

Image Feature ◽

Image Similarity ◽

Similarity Measurement ◽

Fixed Size ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Image similarity measurement is a fundamental problem in the field of computer vision. It is widely used in image classification, object detection, image retrieval, and other fields, mostly through Siamese or triplet networks. These networks consist of two or three identical branches of convolutional neural network (CNN) and share their weights to obtain the high-level image feature representations so that similar images are mapped close to each other in the feature space, and dissimilar image pairs are mapped far from each other. Especially, the triplet network is known as the state-of-the-art method on image similarity measurement. However, the basic CNN can only handle fixed-size images. If we obtain a fixed size image via cutting or scaling, the information of the image will be lost and the recognition accuracy will be reduced. To solve the problem, this paper has proposed the triplet spatial pyramid pooling network (TSPP-Net) through combing the triplet convolution neural network with the spatial pyramid pooling. Additionally, we propose an improved triplet loss function, so that the network model can realize twice distance learning by only inputting three samples at one time. Through the theoretical analysis and experiments, it is proved that the TSPP-Net model and the improved triple loss function can improve the generalization ability and the accuracy of image similarity measurement algorithm.

Download Full-text

Vehicle Pedestrian Detection Method Based on Spatial Pyramid Pooling and Attention Mechanism

Information ◽

10.3390/info11120583 ◽

2020 ◽

Vol 11 (12) ◽

pp. 583

Author(s):

Mingtao Guo ◽

Donghui Xue ◽

Peng Li ◽

He Xu

Keyword(s):

Object Detection ◽

Clustering Algorithm ◽

Pedestrian Detection ◽

Attention Mechanism ◽

Complex Environments ◽

Global Features ◽

Feature Map ◽

Spatial Pyramid Pooling ◽

Institute Of Technology ◽

Spatial Pyramid

Object detection for vehicles and pedestrians is extremely difficult to achieve in autopilot applications for the Internet of vehicles, and it is a task that requires the ability to locate and identify smaller targets even in complex environments. This paper proposes a single-stage object detection network (YOLOv3-promote) for the detection of vehicles and pedestrians in complex environments in cities, which improves on the traditional You Only Look Once version 3 (YOLOv3). First, spatial pyramid pooling is used to fuse local and global features in an image to better enrich the expression ability of the feature map and to more effectively detect targets with large size differences in the image; second, an attention mechanism is added to the feature map to weight each channel, thereby enhancing key features and removing redundant features, which allows for strengthening the ability of the feature network to discriminate between target objects and backgrounds; lastly, the anchor box derived from the K-means clustering algorithm is fitted to the final prediction box to complete the positioning and identification of target vehicles and pedestrians. The experimental results show that the proposed method achieved 91.4 mAP (mean average precision), 83.2 F1 score, and 43.7 frames per second (FPS) on the KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) dataset, and the detection performance was superior to the conventional YOLOv3 algorithm in terms of both accuracy and speed.

Download Full-text

SCNN: A General Distribution Based Statistical Convolutional Neural Network with Application to Video Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015321 ◽

2019 ◽

Vol 33 ◽

pp. 5321-5328 ◽

Cited By ~ 10

Author(s):

Tianchen Wang ◽

Jinjun Xiong ◽

Xiaowei Xu ◽

Yiyu Shi

Keyword(s):

Neural Network ◽

Object Detection ◽

Convolutional Neural Network ◽

Correlated Data ◽

General Distribution ◽

Video Object ◽

Human Beings ◽

Detection And Tracking ◽

General Network ◽

Multiple Frames

Various convolutional neural networks (CNNs) were developed recently that achieved accuracy comparable with that of human beings in computer vision tasks such as image recognition, object detection and tracking, etc. Most of these networks, however, process one single frame of image at a time, and may not fully utilize the temporal and contextual correlation typically present in multiple channels of the same image or adjacent frames from a video, thus limiting the achievable throughput. This limitation stems from the fact that existing CNNs operate on deterministic numbers. In this paper, we propose a novel statistical convolutional neural network (SCNN), which extends existing CNN architectures but operates directly on correlated distributions rather than deterministic numbers. By introducing a parameterized canonical model to model correlated data and defining corresponding operations as required for CNN training and inference, we show that SCNN can process multiple frames of correlated images effectively, hence achieving significant speedup over existing CNN models. We use a CNN based video object detection as an example to illustrate the usefulness of the proposed SCNN as a general network model. Experimental results show that even a nonoptimized implementation of SCNN can still achieve 178% speedup over existing CNNs with slight accuracy degradation.

Download Full-text

A robust approach for industrial small-object detection using an improved faster regional convolutional neural network

Scientific Reports ◽

10.1038/s41598-021-02805-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Faisal Saeed ◽

Muhammad Jamal Ahmed ◽

Malik Junaid Gul ◽

Kim Jeong Hong ◽

Anand Paul ◽

...

Keyword(s):

Neural Network ◽

Object Detection ◽

Convolutional Neural Network ◽

Data Augmentation ◽

Industrial Sector ◽

Small Object ◽

Industrial Products ◽

Amplification Method ◽

Small Object Detection ◽

Improved Model

AbstractWith the increasing pace in the industrial sector, the need for a smart environment is also increasing and the production of industrial products in terms of quality always matters. There is a strong burden on the industrial environment to continue to reduce impulsive downtime, concert deprivation, and safety risks, which needs an efficient solution to detect and improve potential obligations as soon as possible. The systems working in industrial environments for generating industrial products are very fast and generate products rapidly, sometimes leading to faulty products. Therefore, this problem needs to be solved efficiently. Considering this problem in terms of faulty small-object detection, this study proposed an improved faster regional convolutional neural network-based model to detect the faults in the product images. We introduced a novel data-augmentation method along with a bi-cubic interpolation-based feature amplification method. A center loss is also introduced in the loss function to decrease the inter-class similarity issue. The experimental results show that the proposed improved model achieved better classification accuracy for detecting our small faulty objects. The proposed model performs better than the state-of-the-art methods.

Download Full-text

Employing GRU to combine feature maps in DeeplabV3 for a better segmentation model

Nordic Machine Intelligence ◽

10.5617/nmi.9131 ◽

2021 ◽

Vol 1 (1) ◽

pp. 29-31

Author(s):

Mahmood Haithami ◽

Amr Ahmed ◽

Iman Yi Liao ◽

Hamid Jalab

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Feature Maps ◽

Feature Map ◽

Input Feature ◽

Spatial Pyramid Pooling ◽

Test Sets ◽

Public Datasets ◽

Spatial Pyramid

In this paper, we aim to enhance the segmentation capabilities of DeeplabV3 by employing Gated Recurrent Neural Network (GRU). A 1-by-1 convolution in DeeplabV3 was replaced by GRU after the Atrous Spatial Pyramid Pooling (ASSP) layer to combine the input feature maps. The convolution and GRU have sharable parameters, though, the latter has gates that enable/disable the contribution of each input feature map. The experiments on unseen test sets demonstrate that employing GRU instead of convolution would produce better segmentation results. The used datasets are public datasets provided by MedAI competition.

Download Full-text

Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling

Entropy ◽

10.3390/e22091058 ◽

2020 ◽

Vol 22 (9) ◽

pp. 1058

Author(s):

Zhanghui Liu ◽

Yudong Zhang ◽

Yuzhong Chen ◽

Xinwen Fan ◽

Chen Dong

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Traffic ◽

Contextual Information ◽

Recall Rate ◽

Domain Name ◽

Sample Distribution ◽

Domain Names ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting algorithmically generated domain names generally rely on manually extracting statistical characteristics from the domain names or network traffic and then employing classifiers to distinguish the algorithmically generated domain names. These models always require labor intensive manual feature engineering. In contrast, most state-of-the-art models based on deep neural networks are sensitive to imbalance in the sample distribution and cannot fully exploit the discriminative class features in domain names or network traffic, leading to decreased detection accuracy. To address these issues, we employ the borderline synthetic minority over-sampling algorithm (SMOTE) to improve sample balance. We also propose a recurrent convolutional neural network with spatial pyramid pooling (RCNN-SPP) to extract discriminative and distinctive class features. The recurrent convolutional neural network combines a convolutional neural network (CNN) and a bi-directional long short-term memory network (Bi-LSTM) to extract both the semantic and contextual information from domain names. We then employ the spatial pyramid pooling strategy to refine the contextual representation by capturing multi-scale contextual information from domain names. The experimental results from different domain name datasets demonstrate that our model can achieve 92.36% accuracy, an 89.55% recall rate, a 90.46% F1-score, and 95.39% AUC in identifying DGA and legitimate domain names, and it can achieve 92.45% accuracy rate, a 90.12% recall rate, a 90.86% F1-score, and 96.59% AUC in multi-classification problems. It achieves significant improvement over existing models in terms of accuracy and robustness.

Download Full-text

A2SPPNet: Attentive Atrous Spatial Pyramid Pooling Network for Salient Object Detection

IEEE Transactions on Multimedia ◽

10.1109/tmm.2022.3141933 ◽

2022 ◽

pp. 1-1

Author(s):

Yu Qiu ◽

Yun Liu ◽

Yanan Chen ◽

Jianwen Zhang ◽

Jinchao Zhu ◽

...

Keyword(s):

Object Detection ◽

Salient Object Detection ◽

Salient Object ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

An optimized convolutional neural network with bottleneck and spatial pyramid pooling layers for classification of foods

Pattern Recognition Letters ◽

10.1016/j.patrec.2017.12.007 ◽

2018 ◽

Vol 105 ◽

pp. 50-58 ◽

Cited By ~ 9

Author(s):

Elnaz Jahani Heravi ◽

Hamed Habibi Aghdam ◽

Domenec Puig

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

Compact Spatial Pyramid Pooling Deep Convolutional Neural Network Based Hand Gestures Decoder

Applied Sciences ◽

10.3390/app10217898 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7898

Author(s):

Akm Ashiquzzaman ◽

Hyunmin Lee ◽

Kwangki Kim ◽

Hye-Young Kim ◽

Jaehyung Park ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

High Performance ◽

Fixed Number ◽

Hand Gesture ◽

Computing Power ◽

Classical Models ◽

Spatial Pyramid Pooling ◽

Gesture Input ◽

Spatial Pyramid

Current deep learning convolutional neural network (DCNN) -based hand gesture detectors with acute precision demand incredibly high-performance computing power. Although DCNN-based detectors are capable of accurate classification, the sheer computing power needed for this form of classification makes it very difficult to run with lower computational power in remote environments. Moreover, classical DCNN architectures have a fixed number of input dimensions, which forces preprocessing, thus making it impractical for real-world applications. In this research, a practical DCNN with an optimized architecture is proposed with DCNN filter/node pruning, and spatial pyramid pooling (SPP) is introduced in order to make the model input dimension-invariant. This compact SPP-DCNN module uses 65% fewer parameters than traditional classifiers and operates almost 3× faster than classical models. Moreover, the new improved proposed algorithm, which decodes gestures or sign language finger-spelling from videos, gave a benchmark highest accuracy with the fastest processing speed. This proposed method paves the way for various practical and applied hand gesture input-based human-computer interaction (HCI) applications.

Download Full-text

Manchu Word Recognition Based on Convolutional Neural Network with Spatial Pyramid Pooling

2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) ◽

10.1109/cisp-bmei.2018.8633131 ◽

2018 ◽

Author(s):

Min Li ◽

Ruirui Zheng ◽

Shuang Xu ◽

Yu Fu ◽

Di Huang

Keyword(s):

Neural Network ◽

Word Recognition ◽

Convolutional Neural Network ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text