scholarly journals Hybridizing Cross-Level Contextual and Attentive Representations for Remote Sensing Imagery Semantic Segmentation

2021 ◽  
Vol 13 (15) ◽  
pp. 2986
Author(s):  
Xin Li ◽  
Feng Xu ◽  
Runliang Xia ◽  
Xin Lyu ◽  
Hongmin Gao ◽  
...  

Semantic segmentation of remote sensing imagery is a fundamental task in intelligent interpretation. Since deep convolutional neural networks (DCNNs) performed considerable insight in learning implicit representations from data, numerous works in recent years have transferred the DCNN-based model to remote sensing data analysis. However, the wide-range observation areas, complex and diverse objects and illumination and imaging angle influence the pixels easily confused, leading to undesirable results. Therefore, a remote sensing imagery semantic segmentation neural network, named HCANet, is proposed to generate representative and discriminative representations for dense predictions. HCANet hybridizes cross-level contextual and attentive representations to emphasize the distinguishability of learned features. First of all, a cross-level contextual representation module (CCRM) is devised to exploit and harness the superpixel contextual information. Moreover, a hybrid representation enhancement module (HREM) is designed to fuse cross-level contextual and self-attentive representations flexibly. Furthermore, the decoder incorporates DUpsampling operation to boost the efficiency losslessly. The extensive experiments are implemented on the Vaihingen and Potsdam benchmarks. In addition, the results indicate that HCANet achieves excellent performance on overall accuracy and mean intersection over union. In addition, the ablation study further verifies the superiority of CCRM.

2021 ◽  
Vol 14 (1) ◽  
pp. 102
Author(s):  
Xin Li ◽  
Tao Li ◽  
Ziqi Chen ◽  
Kaiwen Zhang ◽  
Runliang Xia

Semantic segmentation has been a fundamental task in interpreting remote sensing imagery (RSI) for various downstream applications. Due to the high intra-class variants and inter-class similarities, inflexibly transferring natural image-specific networks to RSI is inadvisable. To enhance the distinguishability of learnt representations, attention modules were developed and applied to RSI, resulting in satisfactory improvements. However, these designs capture contextual information by equally handling all the pixels regardless of whether they around edges. Therefore, blurry boundaries are generated, rising high uncertainties in classifying vast adjacent pixels. Hereby, we propose an edge distribution attention module (EDA) to highlight the edge distributions of leant feature maps in a self-attentive fashion. In this module, we first formulate and model column-wise and row-wise edge attention maps based on covariance matrix analysis. Furthermore, a hybrid attention module (HAM) that emphasizes the edge distributions and position-wise dependencies is devised combing with non-local block. Consequently, a conceptually end-to-end neural network, termed as EDENet, is proposed to integrate HAM hierarchically for the detailed strengthening of multi-level representations. EDENet implicitly learns representative and discriminative features, providing available and reasonable cues for dense prediction. The experimental results evaluated on ISPRS Vaihingen, Potsdam and DeepGlobe datasets show the efficacy and superiority to the state-of-the-art methods on overall accuracy (OA) and mean intersection over union (mIoU). In addition, the ablation study further validates the effects of EDA.


Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2792 ◽  
Author(s):  
Xuedong Yao ◽  
Hui Yang ◽  
Yanlan Wu ◽  
Penghai Wu ◽  
Biao Wang ◽  
...  

Land use classification is a fundamental task of information extraction from remote sensing imagery. Semantic segmentation based on deep convolutional neural networks (DCNNs) has shown outstanding performance in this task. However, these methods are still affected by the loss of spatial features. In this study, we proposed a new network, called the dense-coordconv network (DCCN), to reduce the loss of spatial features and strengthen object boundaries. In this network, the coordconv module is introduced into the improved DenseNet architecture to improve spatial information by putting coordinate information into feature maps. The proposed DCCN achieved an obvious performance in terms of the public ISPRS (International Society for Photogrammetry and Remote Sensing) 2D semantic labeling benchmark dataset. Compared with the results of other deep convolutional neural networks (U-net, SegNet, Deeplab-V3), the results of the DCCN method improved a lot and the OA (overall accuracy) and mean F1 score reached 89.48% and 86.89%, respectively. This indicates that the DCCN method can effectively reduce the loss of spatial features and improve the accuracy of semantic segmentation in high resolution remote sensing imagery.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3848
Author(s):  
Wei Cui ◽  
Meng Yao ◽  
Yuanjie Hao ◽  
Ziwei Wang ◽  
Xin He ◽  
...  

Pixel-based semantic segmentation models fail to effectively express geographic objects and their topological relationships. Therefore, in semantic segmentation of remote sensing images, these models fail to avoid salt-and-pepper effects and cannot achieve high accuracy either. To solve these problems, object-based models such as graph neural networks (GNNs) are considered. However, traditional GNNs directly use similarity or spatial correlations between nodes to aggregate nodes’ information, which rely too much on the contextual information of the sample. The contextual information of the sample is often distorted, which results in a reduction in the node classification accuracy. To solve this problem, a knowledge and geo-object-based graph convolutional network (KGGCN) is proposed. The KGGCN uses superpixel blocks as nodes of the graph network and combines prior knowledge with spatial correlations during information aggregation. By incorporating the prior knowledge obtained from all samples of the study area, the receptive field of the node is extended from its sample context to the study area. Thus, the distortion of the sample context is overcome effectively. Experiments demonstrate that our model is improved by 3.7% compared with the baseline model named Cluster GCN and 4.1% compared with U-Net.


2020 ◽  
Author(s):  
Matheus B. Pereira ◽  
Jefersson Alex Dos Santos

High-resolution aerial images are usually not accessible or affordable. On the other hand, low-resolution remote sensing data is easily found in public open repositories. The problem is that the low-resolution representation can compromise pattern recognition algorithms, especially semantic segmentation. In this M.Sc. dissertation1 , we design two frameworks in order to evaluate the effectiveness of super-resolution in the semantic segmentation of low-resolution remote sensing images. We carried out an extensive set of experiments on different remote sensing datasets. The results show that super-resolution is effective to improve semantic segmentation performance on low-resolution aerial imagery, outperforming unsupervised interpolation and achieving semantic segmentation results comparable to highresolution data.


2021 ◽  
Vol 13 (22) ◽  
pp. 4518
Author(s):  
Xin Zhao ◽  
Jiayi Guo ◽  
Yueting Zhang ◽  
Yirong Wu

The semantic segmentation of remote sensing images requires distinguishing local regions of different classes and exploiting a uniform global representation of the same-class instances. Such requirements make it necessary for the segmentation methods to extract discriminative local features between different classes and to explore representative features for all instances of a given class. While common deep convolutional neural networks (DCNNs) can effectively focus on local features, they are limited by their receptive field to obtain consistent global information. In this paper, we propose a memory-augmented transformer (MAT) to effectively model both the local and global information. The feature extraction pipeline of the MAT is split into a memory-based global relationship guidance module and a local feature extraction module. The local feature extraction module mainly consists of a transformer, which is used to extract features from the input images. The global relationship guidance module maintains a memory bank for the consistent encoding of the global information. Global guidance is performed by memory interaction. Bidirectional information flow between the global and local branches is conducted by a memory-query module, as well as a memory-update module, respectively. Experiment results on the ISPRS Potsdam and ISPRS Vaihingen datasets demonstrated that our method can perform competitively with state-of-the-art methods.


2021 ◽  
Vol 15 (02) ◽  
Author(s):  
Annus Zulfiqar ◽  
Muhammad M. Ghaffar ◽  
Muhammad Shahzad ◽  
Christian Weis ◽  
Muhammad I. Malik ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document