Deep Image Similarity Measurement Based on the Improved Triplet Network with Spatial Pyramid Pooling

Xinpan Yuan; Qunfeng Liu; Jun Long; Lei Hu; Yulou Wang

doi:10.3390/info10040129

Deep Image Similarity Measurement Based on the Improved Triplet Network with Spatial Pyramid Pooling

Information ◽

10.3390/info10040129 ◽

2019 ◽

Vol 10 (4) ◽

pp. 129 ◽

Cited By ~ 4

Author(s):

Xinpan Yuan ◽

Qunfeng Liu ◽

Jun Long ◽

Lei Hu ◽

Yulou Wang

Keyword(s):

Neural Network ◽

Loss Function ◽

Fundamental Problem ◽

Feature Space ◽

Image Feature ◽

Image Similarity ◽

Similarity Measurement ◽

Fixed Size ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Image similarity measurement is a fundamental problem in the field of computer vision. It is widely used in image classification, object detection, image retrieval, and other fields, mostly through Siamese or triplet networks. These networks consist of two or three identical branches of convolutional neural network (CNN) and share their weights to obtain the high-level image feature representations so that similar images are mapped close to each other in the feature space, and dissimilar image pairs are mapped far from each other. Especially, the triplet network is known as the state-of-the-art method on image similarity measurement. However, the basic CNN can only handle fixed-size images. If we obtain a fixed size image via cutting or scaling, the information of the image will be lost and the recognition accuracy will be reduced. To solve the problem, this paper has proposed the triplet spatial pyramid pooling network (TSPP-Net) through combing the triplet convolution neural network with the spatial pyramid pooling. Additionally, we propose an improved triplet loss function, so that the network model can realize twice distance learning by only inputting three samples at one time. Through the theoretical analysis and experiments, it is proved that the TSPP-Net model and the improved triple loss function can improve the generalization ability and the accuracy of image similarity measurement algorithm.

Download Full-text

Object Detection through Modified YOLO Neural Network

Scientific Programming ◽

10.1155/2020/8403262 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Tanvir Ahmad ◽

Yinglong Ma ◽

Muhammad Yahya ◽

Belal Ahmad ◽

Shah Nazir ◽

...

Keyword(s):

Neural Network ◽

Object Detection ◽

Loss Function ◽

Convolution Kernel ◽

Human Beings ◽

Multiple Objects ◽

Fast Speed ◽

Spatial Pyramid Pooling ◽

Improved Model ◽

Spatial Pyramid

In the field of object detection, recently, tremendous success is achieved, but still it is a very challenging task to detect and identify objects accurately with fast speed. Human beings can detect and recognize multiple objects in images or videos with ease regardless of the object’s appearance, but for computers it is challenging to identify and distinguish between things. In this paper, a modified YOLOv1 based neural network is proposed for object detection. The new neural network model has been improved in the following ways. Firstly, modification is made to the loss function of the YOLOv1 network. The improved model replaces the margin style with proportion style. Compared to the old loss function, the new is more flexible and more reasonable in optimizing the network error. Secondly, a spatial pyramid pooling layer is added; thirdly, an inception model with a convolution kernel of 1 ∗ 1 is added, which reduced the number of weight parameters of the layers. Extensive experiments on Pascal VOC datasets 2007/2012 showed that the proposed method achieved better performance.

Download Full-text

An Advanced Relevance Feedback Method to Improve Performance of CBIR using Convolutional Neural Network and Comprehensive Values

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2741.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 5427-5438

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Image Retrieval ◽

Convolutional Neural Network ◽

Large Scale ◽

Activation Function ◽

Image Feature ◽

Similarity Measurement ◽

Query Image ◽

Image Production

Content-Based Image Retrieval (CBIR) is extensively used technique for image retrieval from large image databases. However, users are not satisfied with the conventional image retrieval techniques. In addition, the advent of web development and transmission networks, the number of images available to users continues to increase. Therefore, a permanent and considerable digital image production in many areas takes place. Quick access to the similar images of a given query image from this extensive collection of images pose great challenges and require proficient techniques. From query by image to retrieval of relevant images, CBIR has key phases such as feature extraction, similarity measurement, and retrieval of relevant images. However, extracting the features of the images is one of the important steps. Recently Convolutional Neural Network (CNN) shows good results in the field of computer vision due to the ability of feature extraction from the images. Alex Net is a classical Deep CNN for image feature extraction. We have modified the Alex Net Architecture with a few changes and proposed a novel framework to improve its ability for feature extraction and for similarity measurement. The proposal approach optimizes Alex Net in the aspect of pooling layer. In particular, average pooling is replaced by max-avg pooling and the non-linear activation function Maxout is used after every Convolution layer for better feature extraction. This paper introduces CNN for features extraction from images in CBIR system and also presents Euclidean distance along with the Comprehensive Values for better results. The proposed framework goes beyond image retrieval, including the large-scale database. The performance of the proposed work is evaluated using precision. The proposed work show better results than existing works.

Download Full-text

Video-Based Human Action Recognition Using Spatial Pyramid Pooling and 3D Densely Convolutional Networks

Future Internet ◽

10.3390/fi10120115 ◽

2018 ◽

Vol 10 (12) ◽

pp. 115

Author(s):

Wanli Yang ◽

Yimin Chen ◽

Chen Huang ◽

Mingke Gao

Keyword(s):

Neural Networks ◽

Network Structure ◽

Three Dimensional ◽

Human Action Recognition ◽

Human Action ◽

Fixed Size ◽

Behavior Recognition ◽

Convolutional Network ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

In recent years, the application of deep neural networks to human behavior recognition has become a hot topic. Although remarkable achievements have been made in the field of image recognition, there are still many problems to be solved in the area of video. It is well known that convolutional neural networks require a fixed size image input, which not only limits the network structure but also affects the recognition accuracy. Although this problem has been solved in the field of images, it has not yet been broken through in the field of video. To address the input problem of fixed size video frames in video recognition, we propose a three-dimensional (3D) densely connected convolutional network based on spatial pyramid pooling (3D-DenseNet-SPP). As the name implies, the network structure is mainly composed of three parts: 3DCNN, DenseNet, and SPPNet. Our models were evaluated on a KTH dataset and UCF101 dataset separately. The experimental results showed that our model has better performance in the field of video-based behavior recognition in comparison to the existing models.

Download Full-text

Employing GRU to combine feature maps in DeeplabV3 for a better segmentation model

Nordic Machine Intelligence ◽

10.5617/nmi.9131 ◽

2021 ◽

Vol 1 (1) ◽

pp. 29-31

Author(s):

Mahmood Haithami ◽

Amr Ahmed ◽

Iman Yi Liao ◽

Hamid Jalab

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Feature Maps ◽

Feature Map ◽

Input Feature ◽

Spatial Pyramid Pooling ◽

Test Sets ◽

Public Datasets ◽

Spatial Pyramid

In this paper, we aim to enhance the segmentation capabilities of DeeplabV3 by employing Gated Recurrent Neural Network (GRU). A 1-by-1 convolution in DeeplabV3 was replaced by GRU after the Atrous Spatial Pyramid Pooling (ASSP) layer to combine the input feature maps. The convolution and GRU have sharable parameters, though, the latter has gates that enable/disable the contribution of each input feature map. The experiments on unseen test sets demonstrate that employing GRU instead of convolution would produce better segmentation results. The used datasets are public datasets provided by MedAI competition.

Download Full-text

Prognosis of Bearing and Gear Wears Using Convolutional Neural Network with Hybrid Loss Function

Sensors ◽

10.3390/s20123539 ◽

2020 ◽

Vol 20 (12) ◽

pp. 3539 ◽

Cited By ~ 6

Author(s):

Chang-Cheng Lo ◽

Ching-Hung Lee ◽

Wen-Cheng Huang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Loss Function ◽

Feature Space ◽

Vibration Signals ◽

One Dimensional ◽

Experimental Platform ◽

Proposed Model ◽

The Status ◽

Gear Mechanism

This study aimed to propose a prognostic method based on a one-dimensional convolutional neural network (1-D CNN) with clustering loss by classification training. The 1-D CNN was trained by collecting the vibration signals of normal and malfunction data in hybrid loss function (i.e., classification loss in output and clustering loss in feature space). Subsequently, the obtained feature was adopted to estimate the status for prognosis. The open bearing dataset and established gear platform were utilized to validate the functionality and feasibility of the proposed model. Moreover, the experimental platform was used to simulate the gear mechanism of the semiconductor robot to conduct a practical experiment to verify the accuracy of the model estimation. The experimental results demonstrate the performance and effectiveness of the proposed method.

Download Full-text

Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling

Entropy ◽

10.3390/e22091058 ◽

2020 ◽

Vol 22 (9) ◽

pp. 1058

Author(s):

Zhanghui Liu ◽

Yudong Zhang ◽

Yuzhong Chen ◽

Xinwen Fan ◽

Chen Dong

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Traffic ◽

Contextual Information ◽

Recall Rate ◽

Domain Name ◽

Sample Distribution ◽

Domain Names ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting algorithmically generated domain names generally rely on manually extracting statistical characteristics from the domain names or network traffic and then employing classifiers to distinguish the algorithmically generated domain names. These models always require labor intensive manual feature engineering. In contrast, most state-of-the-art models based on deep neural networks are sensitive to imbalance in the sample distribution and cannot fully exploit the discriminative class features in domain names or network traffic, leading to decreased detection accuracy. To address these issues, we employ the borderline synthetic minority over-sampling algorithm (SMOTE) to improve sample balance. We also propose a recurrent convolutional neural network with spatial pyramid pooling (RCNN-SPP) to extract discriminative and distinctive class features. The recurrent convolutional neural network combines a convolutional neural network (CNN) and a bi-directional long short-term memory network (Bi-LSTM) to extract both the semantic and contextual information from domain names. We then employ the spatial pyramid pooling strategy to refine the contextual representation by capturing multi-scale contextual information from domain names. The experimental results from different domain name datasets demonstrate that our model can achieve 92.36% accuracy, an 89.55% recall rate, a 90.46% F1-score, and 95.39% AUC in identifying DGA and legitimate domain names, and it can achieve 92.45% accuracy rate, a 90.12% recall rate, a 90.86% F1-score, and 96.59% AUC in multi-classification problems. It achieves significant improvement over existing models in terms of accuracy and robustness.

Download Full-text

Siamese CNN-BiLSTM Architecture for 3D Shape Representation Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/93 ◽

2018 ◽

Cited By ~ 11

Author(s):

Guoxian Dai ◽

Jin Xie ◽

Yi Fang

Keyword(s):

Neural Network ◽

Loss Function ◽

Short Term Memory ◽

Shape Representation ◽

Feature Space ◽

Representation Learning ◽

3D Shape ◽

Aggregate Information ◽

3D Shapes ◽

2D Images

Learning a 3D shape representation from a collection of its rendered 2D images has been extensively studied. However, existing view-based techniques have not yet fully exploited the information among all the views of projections. In this paper, by employing recurrent neural network to efficiently capture features across different views, we propose a siamese CNN-BiLSTM network for 3D shape representation learning. The proposed method minimizes a discriminative loss function to learn a deep nonlinear transformation, mapping 3D shapes from the original space into a nonlinear feature space. In the transformed space, the distance of 3D shapes with the same label is minimized, otherwise the distance is maximized to a large margin. Specifically, the 3D shapes are first projected into a group of 2D images from different views. Then convolutional neural network (CNN) is adopted to extract features from different view images, followed by a bidirectional long short-term memory (LSTM) to aggregate information across different views. Finally, we construct the whole CNN-BiLSTM network into a siamese structure with contrastive loss function. Our proposed method is evaluated on two benchmarks, ModelNet40 and SHREC 2014, demonstrating superiority over the state-of-the-art methods.

Download Full-text

An optimized convolutional neural network with bottleneck and spatial pyramid pooling layers for classification of foods

Pattern Recognition Letters ◽

10.1016/j.patrec.2017.12.007 ◽

2018 ◽

Vol 105 ◽

pp. 50-58 ◽

Cited By ~ 9

Author(s):

Elnaz Jahani Heravi ◽

Hamed Habibi Aghdam ◽

Domenec Puig

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

Compact Spatial Pyramid Pooling Deep Convolutional Neural Network Based Hand Gestures Decoder

Applied Sciences ◽

10.3390/app10217898 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7898

Author(s):

Akm Ashiquzzaman ◽

Hyunmin Lee ◽

Kwangki Kim ◽

Hye-Young Kim ◽

Jaehyung Park ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

High Performance ◽

Fixed Number ◽

Hand Gesture ◽

Computing Power ◽

Classical Models ◽

Spatial Pyramid Pooling ◽

Gesture Input ◽

Spatial Pyramid

Current deep learning convolutional neural network (DCNN) -based hand gesture detectors with acute precision demand incredibly high-performance computing power. Although DCNN-based detectors are capable of accurate classification, the sheer computing power needed for this form of classification makes it very difficult to run with lower computational power in remote environments. Moreover, classical DCNN architectures have a fixed number of input dimensions, which forces preprocessing, thus making it impractical for real-world applications. In this research, a practical DCNN with an optimized architecture is proposed with DCNN filter/node pruning, and spatial pyramid pooling (SPP) is introduced in order to make the model input dimension-invariant. This compact SPP-DCNN module uses 65% fewer parameters than traditional classifiers and operates almost 3× faster than classical models. Moreover, the new improved proposed algorithm, which decodes gestures or sign language finger-spelling from videos, gave a benchmark highest accuracy with the fastest processing speed. This proposed method paves the way for various practical and applied hand gesture input-based human-computer interaction (HCI) applications.

Download Full-text

Manchu Word Recognition Based on Convolutional Neural Network with Spatial Pyramid Pooling

2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) ◽

10.1109/cisp-bmei.2018.8633131 ◽

2018 ◽

Author(s):

Min Li ◽

Ruirui Zheng ◽

Shuang Xu ◽

Yu Fu ◽

Di Huang

Keyword(s):

Neural Network ◽

Word Recognition ◽

Convolutional Neural Network ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text