DEANet: Dual Encoder with Attention Network for Semantic Segmentation of Remote Sensing Imagery

Haoran Wei; Xiangyang Xu; Ni Ou; Xinru Zhang; Yaping Dai

doi:10.3390/rs13193900

DEANet: Dual Encoder with Attention Network for Semantic Segmentation of Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs13193900 ◽

2021 ◽

Vol 13 (19) ◽

pp. 3900

Author(s):

Haoran Wei ◽

Xiangyang Xu ◽

Ni Ou ◽

Xinru Zhang ◽

Yaping Dai

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Semantic Segmentation ◽

Feature Maps ◽

Attention Network ◽

Remote Sensing Imagery ◽

Training Strategy ◽

Sensing Technology ◽

Deep Learning Network ◽

Public Datasets

Remote sensing has now been widely used in various fields, and the research on the automatic land-cover segmentation methods of remote sensing imagery is significant to the development of remote sensing technology. Deep learning methods, which are developing rapidly in the field of semantic segmentation, have been widely applied to remote sensing imagery segmentation. In this work, a novel deep learning network—Dual Encoder with Attention Network (DEANet) is proposed. In this network, a dual-branch encoder structure, whose first branch is used to generate a rough guidance feature map as area attention to help re-encode feature maps in the next branch, is proposed to improve the encoding ability of the network, and an improved pyramid partial decoder (PPD) based on the parallel partial decoder is put forward to make fuller use of the features form the encoder along with the receptive filed block (RFB). In addition, an edge attention module using the transfer learning method is introduced to explicitly advance the segmentation performance in edge areas. Except for structure, a loss function composed with the weighted Cross Entropy (CE) loss and weighted Union subtract Intersection (UsI) loss is designed for training, where UsI loss represents a new region-based aware loss which replaces the IoU loss to adapt to multi-classification tasks. Furthermore, a detailed training strategy for the network is introduced as well. Extensive experiments on three public datasets verify the effectiveness of each proposed module in our framework and demonstrate that our method achieves more excellent performance over some state-of-the-art methods.

Download Full-text

Attentively Learning Edge Distributions for Semantic Segmentation of Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs14010102 ◽

2021 ◽

Vol 14 (1) ◽

pp. 102

Author(s):

Xin Li ◽

Tao Li ◽

Ziqi Chen ◽

Kaiwen Zhang ◽

Runliang Xia

Keyword(s):

Remote Sensing ◽

Contextual Information ◽

Semantic Segmentation ◽

Matrix Analysis ◽

Natural Image ◽

Feature Maps ◽

Remote Sensing Imagery ◽

Edge Distribution ◽

Non Local ◽

Ablation Study

Semantic segmentation has been a fundamental task in interpreting remote sensing imagery (RSI) for various downstream applications. Due to the high intra-class variants and inter-class similarities, inflexibly transferring natural image-specific networks to RSI is inadvisable. To enhance the distinguishability of learnt representations, attention modules were developed and applied to RSI, resulting in satisfactory improvements. However, these designs capture contextual information by equally handling all the pixels regardless of whether they around edges. Therefore, blurry boundaries are generated, rising high uncertainties in classifying vast adjacent pixels. Hereby, we propose an edge distribution attention module (EDA) to highlight the edge distributions of leant feature maps in a self-attentive fashion. In this module, we first formulate and model column-wise and row-wise edge attention maps based on covariance matrix analysis. Furthermore, a hybrid attention module (HAM) that emphasizes the edge distributions and position-wise dependencies is devised combing with non-local block. Consequently, a conceptually end-to-end neural network, termed as EDENet, is proposed to integrate HAM hierarchically for the detailed strengthening of multi-level representations. EDENet implicitly learns representative and discriminative features, providing available and reasonable cues for dense prediction. The experimental results evaluated on ISPRS Vaihingen, Potsdam and DeepGlobe datasets show the efficacy and superiority to the state-of-the-art methods on overall accuracy (OA) and mean intersection over union (mIoU). In addition, the ablation study further validates the effects of EDA.

Download Full-text

HQ-ISNet: High-Quality Instance Segmentation for Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs12060989 ◽

2020 ◽

Vol 12 (6) ◽

pp. 989 ◽

Cited By ~ 1

Author(s):

Hao Su ◽

Shunjun Wei ◽

Shan Liu ◽

Jiadian Liang ◽

Chen Wang ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Object Detection ◽

Prediction Accuracy ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Feature Maps ◽

High Quality ◽

Remote Sensing Imagery ◽

Instance Segmentation

Instance segmentation in high-resolution (HR) remote sensing imagery is one of the most challenging tasks and is more difficult than object detection and semantic segmentation tasks. It aims to predict class labels and pixel-wise instance masks to locate instances in an image. However, there are rare methods currently suitable for instance segmentation in the HR remote sensing images. Meanwhile, it is more difficult to implement instance segmentation due to the complex background of remote sensing images. In this article, a novel instance segmentation approach of HR remote sensing imagery based on Cascade Mask R-CNN is proposed, which is called a high-quality instance segmentation network (HQ-ISNet). In this scheme, the HQ-ISNet exploits a HR feature pyramid network (HRFPN) to fully utilize multi-level feature maps and maintain HR feature maps for remote sensing images’ instance segmentation. Next, to refine mask information flow between mask branches, the instance segmentation network version 2 (ISNetV2) is proposed to promote further improvements in mask prediction accuracy. Then, we construct a new, more challenging dataset based on the synthetic aperture radar (SAR) ship detection dataset (SSDD) and the Northwestern Polytechnical University very-high-resolution 10-class geospatial object detection dataset (NWPU VHR-10) for remote sensing images instance segmentation which can be used as a benchmark for evaluating instance segmentation algorithms in the high-resolution remote sensing images. Finally, extensive experimental analyses and comparisons on the SSDD and the NWPU VHR-10 dataset show that (1) the HRFPN makes the predicted instance masks more accurate, which can effectively enhance the instance segmentation performance of the high-resolution remote sensing imagery; (2) the ISNetV2 is effective and promotes further improvements in mask prediction accuracy; (3) our proposed framework HQ-ISNet is effective and more accurate for instance segmentation in the remote sensing imagery than the existing algorithms.

Download Full-text

Farmland Recognition of High Resolution Multispectral Remote Sensing Imagery using Deep Learning Semantic Segmentation Method

Proceedings of the 2019 the International Conference on Pattern Recognition and Artificial Intelligence - PRAI '19 ◽

10.1145/3357777.3357788 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zheng Shuangpeng ◽

Fang Tao ◽

Huo Hong

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Semantic Segmentation ◽

Segmentation Method ◽

Remote Sensing Imagery ◽

Multispectral Remote Sensing

Download Full-text

Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2018.04.014 ◽

2018 ◽

Vol 145 ◽

pp. 60-77 ◽

Cited By ~ 94

Author(s):

Ronald Kemker ◽

Carl Salvaggio ◽

Christopher Kanan

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Semantic Segmentation ◽

Remote Sensing Imagery ◽

Multispectral Remote Sensing

Download Full-text

Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3484440 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-20

Author(s):

Jiaqi Zhao ◽

Yong Zhou ◽

Boyu Shi ◽

Jingsong Yang ◽

Di Zhang ◽

...

Keyword(s):

Remote Sensing ◽

Rapid Development ◽

Remote Sensing Data ◽

Semantic Segmentation ◽

Sensor Technology ◽

Feature Maps ◽

Attention Network ◽

Modal Data ◽

Sensing Data ◽

Multi Stage

With the rapid development of sensor technology, lots of remote sensing data have been collected. It effectively obtains good semantic segmentation performance by extracting feature maps based on multi-modal remote sensing images since extra modal data provides more information. How to make full use of multi-model remote sensing data for semantic segmentation is challenging. Toward this end, we propose a new network called Multi-Stage Fusion and Multi-Source Attention Network ((MS) 2 -Net) for multi-modal remote sensing data segmentation. The multi-stage fusion module fuses complementary information after calibrating the deviation information by filtering the noise from the multi-modal data. Besides, similar feature points are aggregated by the proposed multi-source attention for enhancing the discriminability of features with different modalities. The proposed model is evaluated on publicly available multi-modal remote sensing data sets, and results demonstrate the effectiveness of the proposed method.

Download Full-text

RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images

Remote Sensing ◽

10.3390/rs12030389 ◽

2020 ◽

Vol 12 (3) ◽

pp. 389 ◽

Cited By ~ 6

Author(s):

Yangyang Li ◽

Qin Huang ◽

Xuan Pei ◽

Licheng Jiao ◽

Ronghua Shang

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Small Object ◽

Remote Sensing Images ◽

Feature Maps ◽

Attention Network ◽

Complex Background ◽

Bounding Box ◽

Feature Pyramid ◽

Public Datasets

Object detection has made significant progress in many real-world scenes. Despite this remarkable progress, the common use case of detection in remote sensing images remains challenging even for leading object detectors, due to the complex background, objects with arbitrary orientation, and large difference in scale of objects. In this paper, we propose a novel rotation detector for remote sensing images, mainly inspired by Mask R-CNN, namely RADet. RADet can obtain the rotation bounding box of objects with shape mask predicted by the mask branch, which is a novel, simple and effective way to get the rotation bounding box of objects. Specifically, a refine feature pyramid network is devised with an improved building block constructing top-down feature maps, to solve the problem of large difference in scales. Meanwhile, the position attention network and the channel attention network are jointly explored by modeling the spatial position dependence between global pixels and highlighting the object feature, for detecting small object surrounded by complex background. Extensive experiments on two remote sensing public datasets, DOTA and NWPUVHR -10, show our method to outperform existing leading object detectors in remote sensing field.

Download Full-text

A review of deep learning methods for semantic segmentation of remote sensing imagery

Expert Systems with Applications ◽

10.1016/j.eswa.2020.114417 ◽

2021 ◽

Vol 169 ◽

pp. 114417

Author(s):

Xiaohui Yuan ◽

Jianfang Shi ◽

Lichuan Gu

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Semantic Segmentation ◽

Learning Methods ◽

Remote Sensing Imagery

Download Full-text

Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network

Remote Sensing ◽

10.3390/rs11151774 ◽

2019 ◽

Vol 11 (15) ◽

pp. 1774 ◽

Cited By ~ 21

Author(s):

Yaning Yi ◽

Zhijie Zhang ◽

Wanchang Zhang ◽

Chuanrong Zhang ◽

Weidong Li ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Semantic Segmentation ◽

Deep Convolutional Neural Network ◽

The Other ◽

Feature Maps ◽

Remote Sensing Imagery ◽

Urban Buildings ◽

Sampling Network

Urban building segmentation is a prevalent research domain for very high resolution (VHR) remote sensing; however, various appearances and complicated background of VHR remote sensing imagery make accurate semantic segmentation of urban buildings a challenge in relevant applications. Following the basic architecture of U-Net, an end-to-end deep convolutional neural network (denoted as DeepResUnet) was proposed, which can effectively perform urban building segmentation at pixel scale from VHR imagery and generate accurate segmentation results. The method contains two sub-networks: One is a cascade down-sampling network for extracting feature maps of buildings from the VHR image, and the other is an up-sampling network for reconstructing those extracted feature maps back to the same size of the input VHR image. The deep residual learning approach was adopted to facilitate training in order to alleviate the degradation problem that often occurred in the model training process. The proposed DeepResUnet was tested with aerial images with a spatial resolution of 0.075 m and was compared in performance under the exact same conditions with six other state-of-the-art networks—FCN-8s, SegNet, DeconvNet, U-Net, ResUNet and DeepUNet. Results of extensive experiments indicated that the proposed DeepResUnet outperformed the other six existing networks in semantic segmentation of urban buildings in terms of visual and quantitative evaluation, especially in labeling irregular-shape and small-size buildings with higher accuracy and entirety. Compared with the U-Net, the F1 score, Kappa coefficient and overall accuracy of DeepResUnet were improved by 3.52%, 4.67% and 1.72%, respectively. Moreover, the proposed DeepResUnet required much fewer parameters than the U-Net, highlighting its significant improvement among U-Net applications. Nevertheless, the inference time of DeepResUnet is slightly longer than that of the U-Net, which is subject to further improvement.

Download Full-text

Region-Enhancing Network for Semantic Segmentation of Remote-Sensing Imagery

Sensors ◽

10.3390/s21217316 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7316

Author(s):

Bo Zhong ◽

Jiang Du ◽

Minghao Liu ◽

Aixia Yang ◽

Junjun Wu

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Semantic Segmentation ◽

The State ◽

Learning Ability ◽

Remote Sensing Imagery ◽

Multi Scale ◽

Context Learning ◽

Learning Procedure ◽

Public Datasets

Semantic segmentation for high-resolution remote-sensing imagery (HRRSI) has become increasingly popular in machine vision in recent years. Most of the state-of-the-art methods for semantic segmentation of HRRSI usually emphasize the strong learning ability of deep convolutional neural network to model the contextual relationship in the image, which takes too much consideration on every pixel in images and subsequently causes the problem of overlearning. Annotation errors and easily confused features can also lead to the confusion problem while using the pixel-based methods. Therefore, we propose a new semantic segmentation network—the region-enhancing network (RE-Net)—to emphasize the regional information instead of pixels to solve the above problems. RE-Net introduces the regional information into the base network, to enhance the regional integrity of images and thus reduce misclassification. Specifically, the regional context learning procedure (RCLP) can learn the context relationship from the perspective of regions. The region correcting procedure (RCP) uses the pixel aggregation feature to recalibrate the pixel features in each region. In addition, another simple intra-network multi-scale attention module is introduced to select features at different scales by the size of the region. A large number of comparative experiments on four different public datasets demonstrate that the proposed RE-Net performs better than most of the state-of-the-art ones.

Download Full-text

Automated Extraction of Antarctic Glacier and Ice Shelf Fronts from Sentinel-1 Imagery Using Deep Learning

Remote Sensing ◽

10.3390/rs11212529 ◽

2019 ◽

Vol 11 (21) ◽

pp. 2529 ◽

Cited By ~ 8

Author(s):

Celia A. Baumhoer ◽

Andreas J. Dietz ◽

C. Kneisel ◽

C. Kuenzer

Keyword(s):

Remote Sensing ◽

Time Series ◽

Deep Learning ◽

Semantic Segmentation ◽

Ice Shelf ◽

Low Contrast ◽

Remote Sensing Imagery ◽

Front Position ◽

Elevation Model ◽

The Antarctic

Sea level rise contribution from the Antarctic ice sheet is influenced by changes in glacier and ice shelf front position. Still, little is known about seasonal glacier and ice shelf front fluctuations as the manual delineation of calving fronts from remote sensing imagery is very time-consuming. The major challenge of automatic calving front extraction is the low contrast between floating glacier and ice shelf fronts and the surrounding sea ice. Additionally, in previous decades, remote sensing imagery over the often cloud-covered Antarctic coastline was limited. Nowadays, an abundance of Sentinel-1 imagery over the Antarctic coastline exists and could be used for tracking glacier and ice shelf front movement. To exploit the available Sentinel-1 data, we developed a processing chain allowing automatic extraction of the Antarctic coastline from Seninel-1 imagery and the creation of dense time series to assess calving front change. The core of the proposed workflow is a modified version of the deep learning architecture U-Net. This convolutional neural network (CNN) performs a semantic segmentation on dual-pol Sentinel-1 data and the Antarctic TanDEM-X digital elevation model (DEM). The proposed method is tested for four training and test areas along the Antarctic coastline. The automatically extracted fronts deviate on average 78 m in training and 108 m test areas. Spatial and temporal transferability is demonstrated on an automatically extracted 15-month time series along the Getz Ice Shelf. Between May 2017 and July 2018, the fronts along the Getz Ice Shelf show mostly an advancing tendency with the fastest moving front of DeVicq Glacier with 726 ± 20 m/yr.

Download Full-text