Bridge Crack Detection Based on SSENets

Haotian Li; Hongyan Xu; Xiaodong Tian; Yi Wang; Huaiyu Cai; Kerang Cui; Xiaodong Chen

doi:10.3390/app10124230

Bridge Crack Detection Based on SSENets

Applied Sciences ◽

10.3390/app10124230 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4230 ◽

Cited By ~ 1

Author(s):

Haotian Li ◽

Hongyan Xu ◽

Xiaodong Tian ◽

Yi Wang ◽

Huaiyu Cai ◽

...

Keyword(s):

Neural Networks ◽

Computational Complexity ◽

Crack Detection ◽

Contextual Information ◽

Detection Accuracy ◽

Multi Scale ◽

Surrounding Environment ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid ◽

Better Than

Bridge crack detection is essential to prevent transportation accidents. However, the surrounding environment has great interference with the detection of cracks, which makes it difficult to ensure the accuracy of the detection. In order to accurately detect bridge cracks, we proposed an end-to-end model named Skip-Squeeze-and-Excitation Networks (SSENets). It is mainly composed of the Skip-Squeeze-Excitation (SSE) module and the Atrous Spatial Pyramid Pooling (ASPP) module. The SSE module uses skip-connection strategy to enhance the gradient correlation between the shallow network and deeper network, alleviating the vanishing gradient caused by the deepening of the network. The ASPP module can extract multi-scale contextual information of images, while the depthwise separable convolution reduces computational complexity. In order to avoid destroying the topology of crack, we used atrous convolution instead of the pooling layer. The proposed SSENets achieved a detection accuracy of 97.77%, which performed better than the models we compared it with. The designed SSE module which used skip-connection strategy can be embedded in other convolutional neural networks (CNNs) to improve their performance.

Download Full-text

P-LINKNET: LINKNET WITH SPATIAL PYRAMID POOLING FOR HIGH-RESOLUTION SATELLITE IMAGERY

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-35-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 35-40

Author(s):

Y. Ding ◽

M. Wu ◽

Y. Xu ◽

S. Duan

Keyword(s):

High Resolution ◽

Contextual Information ◽

City Management ◽

Multi Scale ◽

Information Updating ◽

Proposed Model ◽

Spatial Pyramid Pooling ◽

High Resolution Satellite Imagery ◽

Deep Learning Model ◽

Spatial Pyramid

Abstract. Automatic extraction of buildings from high-resolution remote sensing imagery is very useful in many applications such as city management, mapping, urban planning and geographic information updating. Although extensively studied in the past years, due to the general texture of the building and the complexity of the image background, high-precision building segmentation from high-resolution sensing image is still a challenging task. Repeated pooling and striding operations used in CNNs reduce feature resolutions and cause the loss of detail information. In order to solve this problem, we proposed a deep learning model with a spatial pyramid pooling module based on the LinkNet. The proposed model called P-LinkNet that takes advantage of a spatial pyramid pooling module to capture and aggregate multi-scale contextual information. We tested it on Inria Building dataset. Experimental results show that the proposed P-LinkNet is superior to the LinkNet.

Download Full-text

Research on Generation Method of Grasp Strategy Based on DeepLab V3+ for Three-Finger Gripper

Information ◽

10.3390/info12070278 ◽

2021 ◽

Vol 12 (7) ◽

pp. 278

Author(s):

Sanlong Jiang ◽

Shaobo Li ◽

Qiang Bai ◽

Jing Yang ◽

Yanming Miao ◽

...

Keyword(s):

Field Of View ◽

Convolution Kernel ◽

Parallel Connection ◽

Multi Scale ◽

Spatial Pyramid Pooling ◽

The Stability ◽

Parameter Values ◽

Filter Layer ◽

Complex Dataset ◽

Better Than

A reasonable grasping strategy is a prerequisite for the successful grasping of a target, and it is also a basic condition for the wide application of robots. Presently, mainstream grippers on the market are divided into two-finger grippers and three-finger grippers. According to human grasping experience, the stability of three-finger grippers is much better than that of two-finger grippers. Therefore, this paper’s focus is on the three-finger grasping strategy generation method based on the DeepLab V3+ algorithm. DeepLab V3+ uses the atrous convolution kernel and the atrous spatial pyramid pooling (ASPP) architecture based on atrous convolution. The atrous convolution kernel can adjust the field-of-view of the filter layer by changing the convolution rate. In addition, ASPP can effectively capture multi-scale information, based on the parallel connection of multiple convolution rates of atrous convolutional layers, so that the model performs better on multi-scale objects. The article innovatively uses the DeepLab V3+ algorithm to generate the grasp strategy of a target and optimizes the atrous convolution parameter values of ASPP. This study used the Cornell Grasp dataset to train and verify the model. At the same time, a smaller and more complex dataset of 60 was produced according to the actual situation. Upon testing, good experimental results were obtained.

Download Full-text

Micro Expression Recognition: Multi-scale Approach to Automatic Emotion Recognition by using Spatial Pyramid Pooling Module

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2021.0121274 ◽

2021 ◽

Vol 12 (12) ◽

Author(s):

Lim Jun Sian ◽

Marzuraikah Mohd Stofa ◽

Koo Sie Min ◽

Mohd Asyraf Zulkifley

Keyword(s):

Emotion Recognition ◽

Expression Recognition ◽

Multi Scale ◽

Spatial Pyramid Pooling ◽

Micro Expression ◽

Spatial Pyramid

Download Full-text

Object Detection Based on Region Decomposition and Assembly

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018094 ◽

2019 ◽

Vol 33 ◽

pp. 8094-8101 ◽

Cited By ~ 4

Author(s):

Seung-Hwan Bae

Keyword(s):

Neural Networks ◽

Object Detection ◽

Performance Improvement ◽

Semantic Relations ◽

Detection Accuracy ◽

Semantic Features ◽

Multi Scale ◽

Object Proposals ◽

Object Region ◽

High Level

Region-based object detection infers object regions for one or more categories in an image. Due to the recent advances in deep learning and region proposal methods, object detectors based on convolutional neural networks (CNNs) have been flourishing and provided the promising detection results. However, the detection accuracy is degraded often because of the low discriminability of object CNN features caused by occlusions and inaccurate region proposals. In this paper, we therefore propose a region decomposition and assembly detector (R-DAD) for more accurate object detection.In the proposed R-DAD, we first decompose an object region into multiple small regions. To capture an entire appearance and part details of the object jointly, we extract CNN features within the whole object region and decomposed regions. We then learn the semantic relations between the object and its parts by combining the multi-region features stage by stage with region assembly blocks, and use the combined and high-level semantic features for the object classification and localization. In addition, for more accurate region proposals, we propose a multi-scale proposal layer that can generate object proposals of various scales. We integrate the R-DAD into several feature extractors, and prove the distinct performance improvement on PASCAL07/12 and MSCOCO18 compared to the recent convolutional detectors.

Download Full-text

Video-Based Human Action Recognition Using Spatial Pyramid Pooling and 3D Densely Convolutional Networks

Future Internet ◽

10.3390/fi10120115 ◽

2018 ◽

Vol 10 (12) ◽

pp. 115

Author(s):

Wanli Yang ◽

Yimin Chen ◽

Chen Huang ◽

Mingke Gao

Keyword(s):

Neural Networks ◽

Network Structure ◽

Three Dimensional ◽

Human Action Recognition ◽

Human Action ◽

Fixed Size ◽

Behavior Recognition ◽

Convolutional Network ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

In recent years, the application of deep neural networks to human behavior recognition has become a hot topic. Although remarkable achievements have been made in the field of image recognition, there are still many problems to be solved in the area of video. It is well known that convolutional neural networks require a fixed size image input, which not only limits the network structure but also affects the recognition accuracy. Although this problem has been solved in the field of images, it has not yet been broken through in the field of video. To address the input problem of fixed size video frames in video recognition, we propose a three-dimensional (3D) densely connected convolutional network based on spatial pyramid pooling (3D-DenseNet-SPP). As the name implies, the network structure is mainly composed of three parts: 3DCNN, DenseNet, and SPPNet. Our models were evaluated on a KTH dataset and UCF101 dataset separately. The experimental results showed that our model has better performance in the field of video-based behavior recognition in comparison to the existing models.

Download Full-text

Vehicle detection from high-resolution aerial images using spatial pyramid pooling-based deep convolutional neural networks

Multimedia Tools and Applications ◽

10.1007/s11042-016-4043-5 ◽

2016 ◽

Vol 76 (20) ◽

pp. 21651-21663 ◽

Cited By ~ 32

Author(s):

Tao Qu ◽

Quanyuan Zhang ◽

Shilei Sun

Keyword(s):

Neural Networks ◽

High Resolution ◽

Convolutional Neural Networks ◽

Vehicle Detection ◽

Aerial Images ◽

Deep Convolutional Neural Networks ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

Adaptive Context Encoding Module for Semantic Segmentation

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-027 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 27-1-27-7

Author(s):

Congcong Wang ◽

Faouzi Alaya Cheikh ◽

Azeddine Beghdadi ◽

Ole Jakob Elle

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Experimental Studies ◽

Semantic Segmentation ◽

Multiple Scale ◽

Context Information ◽

Convolution Operation ◽

Sampling Locations ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.

Download Full-text

Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling

Entropy ◽

10.3390/e22091058 ◽

2020 ◽

Vol 22 (9) ◽

pp. 1058

Author(s):

Zhanghui Liu ◽

Yudong Zhang ◽

Yuzhong Chen ◽

Xinwen Fan ◽

Chen Dong

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Traffic ◽

Contextual Information ◽

Recall Rate ◽

Domain Name ◽

Sample Distribution ◽

Domain Names ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting algorithmically generated domain names generally rely on manually extracting statistical characteristics from the domain names or network traffic and then employing classifiers to distinguish the algorithmically generated domain names. These models always require labor intensive manual feature engineering. In contrast, most state-of-the-art models based on deep neural networks are sensitive to imbalance in the sample distribution and cannot fully exploit the discriminative class features in domain names or network traffic, leading to decreased detection accuracy. To address these issues, we employ the borderline synthetic minority over-sampling algorithm (SMOTE) to improve sample balance. We also propose a recurrent convolutional neural network with spatial pyramid pooling (RCNN-SPP) to extract discriminative and distinctive class features. The recurrent convolutional neural network combines a convolutional neural network (CNN) and a bi-directional long short-term memory network (Bi-LSTM) to extract both the semantic and contextual information from domain names. We then employ the spatial pyramid pooling strategy to refine the contextual representation by capturing multi-scale contextual information from domain names. The experimental results from different domain name datasets demonstrate that our model can achieve 92.36% accuracy, an 89.55% recall rate, a 90.46% F1-score, and 95.39% AUC in identifying DGA and legitimate domain names, and it can achieve 92.45% accuracy rate, a 90.12% recall rate, a 90.86% F1-score, and 96.59% AUC in multi-classification problems. It achieves significant improvement over existing models in terms of accuracy and robustness.

Download Full-text

Acoustic Scene Classification Using Spatial Pyramid Pooling with Convolutional Neural Networks

2019 IEEE 13th International Conference on Semantic Computing (ICSC) ◽

10.1109/icosc.2019.8665547 ◽

2019 ◽

Cited By ~ 4

Author(s):

Ahmet Melih Basbug ◽

Mustafa Sert

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Scene Classification ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

EHANet: An Effective Hierarchical Aggregation Network for Face Parsing

Applied Sciences ◽

10.3390/app10093135 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3135 ◽

Cited By ~ 3

Author(s):

Ling Luo ◽

Dingyu Xue ◽

Xinglong Feng

Keyword(s):

Neural Networks ◽

Real World ◽

State Of The Art ◽

Contextual Information ◽

Semantic Gap ◽

Deep Convolutional Neural Networks ◽

Multi Scale ◽

Hierarchical Aggregation ◽

Real World Applications ◽

Weighted Boundary

In recent years, benefiting from deep convolutional neural networks (DCNNs), face parsing has developed rapidly. However, it still has the following problems: (1) Existing state-of-the-art frameworks usually do not satisfy real-time while pursuing performance; (2) similar appearances cause incorrect pixel label assignments, especially in the boundary; (3) to promote multi-scale prediction, deep features and shallow features are used for fusion without considering the semantic gap between them. To overcome these drawbacks, we propose an effective and efficient hierarchical aggregation network called EHANet for fast and accurate face parsing. More specifically, we first propose a stage contextual attention mechanism (SCAM), which uses higher-level contextual information to re-encode the channel according to its importance. Secondly, a semantic gap compensation block (SGCB) is presented to ensure the effective aggregation of hierarchical information. Thirdly, the advantages of weighted boundary-aware loss effectively make up for the ambiguity of boundary semantics. Without any bells and whistles, combined with a lightweight backbone, we achieve outstanding results on both CelebAMask-HQ (78.19% mIoU) and Helen datasets (90.7% F1-score). Furthermore, our model can achieve 55 FPS on a single GTX 1080Ti card with 640 × 640 input and further reach over 300 FPS with a resolution of 256 × 256, which is suitable for real-world applications.

Download Full-text