Employing GRU to combine feature maps in DeeplabV3 for a better segmentation model

Mahmood Haithami; Amr Ahmed; Iman Yi Liao; Hamid Jalab

doi:10.5617/nmi.9131

Employing GRU to combine feature maps in DeeplabV3 for a better segmentation model

Nordic Machine Intelligence ◽

10.5617/nmi.9131 ◽

2021 ◽

Vol 1 (1) ◽

pp. 29-31

Author(s):

Mahmood Haithami ◽

Amr Ahmed ◽

Iman Yi Liao ◽

Hamid Jalab

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Feature Maps ◽

Feature Map ◽

Input Feature ◽

Spatial Pyramid Pooling ◽

Test Sets ◽

Public Datasets ◽

Spatial Pyramid

In this paper, we aim to enhance the segmentation capabilities of DeeplabV3 by employing Gated Recurrent Neural Network (GRU). A 1-by-1 convolution in DeeplabV3 was replaced by GRU after the Atrous Spatial Pyramid Pooling (ASSP) layer to combine the input feature maps. The convolution and GRU have sharable parameters, though, the latter has gates that enable/disable the contribution of each input feature map. The experiments on unseen test sets demonstrate that employing GRU instead of convolution would produce better segmentation results. The used datasets are public datasets provided by MedAI competition.

Download Full-text

Automatic Lung Segmentation on Chest X-rays Using Self-Attention Deep Neural Network

Sensors ◽

10.3390/s21020369 ◽

2021 ◽

Vol 21 (2) ◽

pp. 369

Author(s):

Minki Kim ◽

Byoung-Dai Lee

Keyword(s):

Surgical Planning ◽

Medical Image Segmentation ◽

X Rays ◽

The Novel ◽

Feature Maps ◽

Accurate Identification ◽

Feature Map ◽

Diagnosis And Prognosis ◽

Input Feature ◽

Public Datasets

Accurate identification of the boundaries of organs or abnormal objects (e.g., tumors) in medical images is important in surgical planning and in the diagnosis and prognosis of diseases. In this study, we propose a deep learning-based method to segment lung areas in chest X-rays. The novel aspect of the proposed method is the self-attention module, where the outputs of the channel and spatial attention modules are combined to generate attention maps, with each highlighting those regions of feature maps that correspond to “what” and “where” to attend in the learning process, respectively. Thereafter, the attention maps are multiplied element-wise with the input feature map, and the intermediate results are added to the input feature map again for residual learning. Using X-ray images collected from public datasets for training and evaluation, we applied the proposed attention modules to U-Net for segmentation of lung areas and conducted experiments while changing the locations of the attention modules in the baseline network. The experimental results showed that our method achieved comparable or better performance than the existing medical image segmentation networks in terms of Dice score when the proposed attention modules were placed in lower layers of both the contracting and expanding paths of U-Net.

Download Full-text

Vehicle Pedestrian Detection Method Based on Spatial Pyramid Pooling and Attention Mechanism

Information ◽

10.3390/info11120583 ◽

2020 ◽

Vol 11 (12) ◽

pp. 583

Author(s):

Mingtao Guo ◽

Donghui Xue ◽

Peng Li ◽

He Xu

Keyword(s):

Object Detection ◽

Clustering Algorithm ◽

Pedestrian Detection ◽

Attention Mechanism ◽

Complex Environments ◽

Global Features ◽

Feature Map ◽

Spatial Pyramid Pooling ◽

Institute Of Technology ◽

Spatial Pyramid

Object detection for vehicles and pedestrians is extremely difficult to achieve in autopilot applications for the Internet of vehicles, and it is a task that requires the ability to locate and identify smaller targets even in complex environments. This paper proposes a single-stage object detection network (YOLOv3-promote) for the detection of vehicles and pedestrians in complex environments in cities, which improves on the traditional You Only Look Once version 3 (YOLOv3). First, spatial pyramid pooling is used to fuse local and global features in an image to better enrich the expression ability of the feature map and to more effectively detect targets with large size differences in the image; second, an attention mechanism is added to the feature map to weight each channel, thereby enhancing key features and removing redundant features, which allows for strengthening the ability of the feature network to discriminate between target objects and backgrounds; lastly, the anchor box derived from the K-means clustering algorithm is fitted to the final prediction box to complete the positioning and identification of target vehicles and pedestrians. The experimental results show that the proposed method achieved 91.4 mAP (mean average precision), 83.2 F1 score, and 43.7 frames per second (FPS) on the KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) dataset, and the detection performance was superior to the conventional YOLOv3 algorithm in terms of both accuracy and speed.

Download Full-text

Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling

Entropy ◽

10.3390/e22091058 ◽

2020 ◽

Vol 22 (9) ◽

pp. 1058

Author(s):

Zhanghui Liu ◽

Yudong Zhang ◽

Yuzhong Chen ◽

Xinwen Fan ◽

Chen Dong

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Traffic ◽

Contextual Information ◽

Recall Rate ◽

Domain Name ◽

Sample Distribution ◽

Domain Names ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting algorithmically generated domain names generally rely on manually extracting statistical characteristics from the domain names or network traffic and then employing classifiers to distinguish the algorithmically generated domain names. These models always require labor intensive manual feature engineering. In contrast, most state-of-the-art models based on deep neural networks are sensitive to imbalance in the sample distribution and cannot fully exploit the discriminative class features in domain names or network traffic, leading to decreased detection accuracy. To address these issues, we employ the borderline synthetic minority over-sampling algorithm (SMOTE) to improve sample balance. We also propose a recurrent convolutional neural network with spatial pyramid pooling (RCNN-SPP) to extract discriminative and distinctive class features. The recurrent convolutional neural network combines a convolutional neural network (CNN) and a bi-directional long short-term memory network (Bi-LSTM) to extract both the semantic and contextual information from domain names. We then employ the spatial pyramid pooling strategy to refine the contextual representation by capturing multi-scale contextual information from domain names. The experimental results from different domain name datasets demonstrate that our model can achieve 92.36% accuracy, an 89.55% recall rate, a 90.46% F1-score, and 95.39% AUC in identifying DGA and legitimate domain names, and it can achieve 92.45% accuracy rate, a 90.12% recall rate, a 90.86% F1-score, and 96.59% AUC in multi-classification problems. It achieves significant improvement over existing models in terms of accuracy and robustness.

Download Full-text

Identification of Weakly Pitch-Shifted Voice Based on Convolutional Neural Network

International Journal of Digital Multimedia Broadcasting ◽

10.1155/2020/8927031 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Yongchao Ye ◽

Lingjie Lao ◽

Diqun Yan ◽

Rangding Wang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Topology ◽

Activation Function ◽

Detection Methods ◽

Detection Rates ◽

Feature Map ◽

Dynamic Coefficients ◽

Input Feature ◽

High Detection

Pitch shifting is a common voice editing technique in which the original pitch of a digital voice is raised or lowered. It is likely to be abused by the malicious attacker to conceal his/her true identity. Existing forensic detection methods are no longer effective for weakly pitch-shifted voice. In this paper, we proposed a convolutional neural network (CNN) to detect not only strongly pitch-shifted voice but also weakly pitch-shifted voice of which the shifting factor is less than ±4 semitones. Specifically, linear frequency cepstral coefficients (LFCC) computed from power spectrums are considered and their dynamic coefficients are extracted as the discriminative features. And the CNN model is carefully designed with particular attention to the input feature map, the activation function and the network topology. We evaluated the algorithm on voices from two datasets with three pitch shifting software. Extensive results show that the algorithm achieves high detection rates for both binary and multiple classifications.

Download Full-text

An optimized convolutional neural network with bottleneck and spatial pyramid pooling layers for classification of foods

Pattern Recognition Letters ◽

10.1016/j.patrec.2017.12.007 ◽

2018 ◽

Vol 105 ◽

pp. 50-58 ◽

Cited By ~ 9

Author(s):

Elnaz Jahani Heravi ◽

Hamed Habibi Aghdam ◽

Domenec Puig

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

A Novel Image Classification Approach via Dense-MobileNet Models

Mobile Information Systems ◽

10.1155/2020/7602384 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8 ◽

Cited By ~ 10

Author(s):

Wei Wang ◽

Yutao Li ◽

Ting Zou ◽

Xin Wang ◽

Jieyu You ◽

...

Keyword(s):

Neural Network ◽

Classification Accuracy ◽

Deep Neural Network ◽

Recognition Accuracy ◽

Feature Maps ◽

Computation Cost ◽

Classification Approach ◽

Designed Experiments ◽

Input Feature ◽

Small Growth

As a lightweight deep neural network, MobileNet has fewer parameters and higher classification accuracy. In order to further reduce the number of network parameters and improve the classification accuracy, dense blocks that are proposed in DenseNets are introduced into MobileNet. In Dense-MobileNet models, convolution layers with the same size of input feature maps in MobileNet models are taken as dense blocks, and dense connections are carried out within the dense blocks. The new network structure can make full use of the output feature maps generated by the previous convolution layers in dense blocks, so as to generate a large number of feature maps with fewer convolution cores and repeatedly use the features. By setting a small growth rate, the network further reduces the parameters and the computation cost. Two Dense-MobileNet models, Dense1-MobileNet and Dense2-MobileNet, are designed. Experiments show that Dense2-MobileNet can achieve higher recognition accuracy than MobileNet, while only with fewer parameters and computation cost.

Download Full-text

Compact Spatial Pyramid Pooling Deep Convolutional Neural Network Based Hand Gestures Decoder

Applied Sciences ◽

10.3390/app10217898 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7898

Author(s):

Akm Ashiquzzaman ◽

Hyunmin Lee ◽

Kwangki Kim ◽

Hye-Young Kim ◽

Jaehyung Park ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

High Performance ◽

Fixed Number ◽

Hand Gesture ◽

Computing Power ◽

Classical Models ◽

Spatial Pyramid Pooling ◽

Gesture Input ◽

Spatial Pyramid

Current deep learning convolutional neural network (DCNN) -based hand gesture detectors with acute precision demand incredibly high-performance computing power. Although DCNN-based detectors are capable of accurate classification, the sheer computing power needed for this form of classification makes it very difficult to run with lower computational power in remote environments. Moreover, classical DCNN architectures have a fixed number of input dimensions, which forces preprocessing, thus making it impractical for real-world applications. In this research, a practical DCNN with an optimized architecture is proposed with DCNN filter/node pruning, and spatial pyramid pooling (SPP) is introduced in order to make the model input dimension-invariant. This compact SPP-DCNN module uses 65% fewer parameters than traditional classifiers and operates almost 3× faster than classical models. Moreover, the new improved proposed algorithm, which decodes gestures or sign language finger-spelling from videos, gave a benchmark highest accuracy with the fastest processing speed. This proposed method paves the way for various practical and applied hand gesture input-based human-computer interaction (HCI) applications.

Download Full-text

Manchu Word Recognition Based on Convolutional Neural Network with Spatial Pyramid Pooling

2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) ◽

10.1109/cisp-bmei.2018.8633131 ◽

2018 ◽

Author(s):

Min Li ◽

Ruirui Zheng ◽

Shuang Xu ◽

Yu Fu ◽

Di Huang

Keyword(s):

Neural Network ◽

Word Recognition ◽

Convolutional Neural Network ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

МОДЕЛЬ ТА АЛГОРИТМ НАВЧАННЯ СИСТЕМИ ДЕТЕКТУВАННЯ МАЛОРОЗМІРНИХ ОБ’ЄКТІВ ДЛЯ МАЛОГАБАРИТНИХ БЕЗПІЛОТНИХ ЛІТАЛЬНИХ АПАРАТІВ

RADIOELECTRONIC AND COMPUTER SYSTEMS ◽

10.32620/reks.2018.4.04 ◽

2018 ◽

pp. 41-52

Author(s):

В’ячеслав Васильович Москаленко ◽

Альона Сергіївна Москаленко ◽

Артем Геннадійович Коробов ◽

Микола Олександрович Зарецький ◽

Віктор Анатолійович Семашко

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Learning Algorithm ◽

Detection System ◽

Deep Convolutional Neural Network ◽

Fine Tuning ◽

Small Object ◽

Feature Maps ◽

Feature Map ◽

High Level

The efficient model and learning algorithm of the small object detection system for compact aerial vehicle under conditions of restricted computing resources and the limited volume of the labeled learning set are developed. The four-stage learning algorithm of the object detector is proposed. At the first stage, selecting the type of deep convolutional neural network and the number of low-level layers that is pretrained on the ImageNet dataset for reusing takes place. The second stage involves unsupervised learning of high-level convolutional sparse coding layers using the modification of growing neural gas to automatically determine the required number of neurons and provide optimal distributions of the neurons over the data. Its application makes it possible to utilize the unlabeled learning datasets for the adaptation of the high-level feature description to the domain application area. At the third stage, the output feature map is formed by concatenation of feature maps from the different level of the deep convolutional neural network. At that, there is a reduction of output feature map using principal component analysis and followed by the building of decision rules. In order to perform the classification analysis of output, feature map is proposed to use information-extreme classifier learning on principles of boosting. Besides that, the orthogonal incremental extreme learning machine is used to build the regression model for the predict bounding box of the detected small object. The last stage involves fine-tuning of high-level layers of deep network using simulated annealing metaheuristic algorithm in order to approximate the global optimum of the complex criterion of learning efficiency of detection model. As a result of the use of proposed approach has been achieved 96% correctly detection of objects on the images of the open test dataset which indicates the suitability of the model and learning algorithm for practical use. In this case, the size of the learning dataset that has been used to construct the model was 500 unlabeled and 200 labeled learning samples

Download Full-text

Multi-scale Hierarchical Residual Network for Dense Captioning

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11338 ◽

2019 ◽

Vol 64 ◽

pp. 181-196 ◽

Cited By ~ 4

Author(s):

Yan Tian ◽

Xun Wang ◽

Jiachen Wu ◽

Ruili Wang ◽

Bailin Yang

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Feature Space ◽

Image Feature ◽

Feature Maps ◽

Residual Network ◽

Multi Scale ◽

Residual Learning ◽

Novel Approach ◽

Great Progress

Recent research on dense captioning based on the recurrent neural network and the convolutional neural network has made a great progress. However, mapping from an image feature space to a description space is a nonlinear and multimodel task, which makes it difficult for the current methods to get accurate results. In this paper, we put forward a novel approach for dense captioning based on hourglass-structured residual learning. Discriminant feature maps are obtained by incorporating dense connected networks and residual learning in our model. Finally, the performance of the approach on the Visual Genome V1.0 dataset and the region labelled MS-COCO (Microsoft Common Objects in Context) dataset are demonstrated. The experimental results have shown that our approach outperforms most current methods.

Download Full-text