Semantic Labeling in Remote Sensing Corpora Using Feature Fusion-Based Enhanced Global Convolutional Network with High-Resolution Representations and Depthwise Atrous Convolution

Teerapong Panboonyuen; Kulsawasd Jitkajornwanich; Siam Lawawirojwong; Panu Srestasathiern; Peerapon Vateekul

doi:10.3390/rs12081233

Semantic Labeling in Remote Sensing Corpora Using Feature Fusion-Based Enhanced Global Convolutional Network with High-Resolution Representations and Depthwise Atrous Convolution

Remote Sensing ◽

10.3390/rs12081233 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1233 ◽

Cited By ~ 2

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Fusion ◽

Semantic Segmentation ◽

Route Optimization ◽

Landsat 8 ◽

Data Sets ◽

Convolutional Network ◽

Low Level ◽

Backbone Network

One of the fundamental tasks in remote sensing is the semantic segmentation on the aerial and satellite images. It plays a vital role in applications, such as agriculture planning, map updates, route optimization, and navigation. The state-of-the-art model is the Enhanced Global Convolutional Network (GCN152-TL-A) from our previous work. It composes two main components: (i) the backbone network to extract features and ( i i ) the segmentation network to annotate labels. However, the accuracy can be further improved, since the deep learning network is not designed for recovering low-level features (e.g., river, low vegetation). In this paper, we aim to improve the semantic segmentation network in three aspects, designed explicitly for the remotely sensed domain. First, we propose to employ a modern backbone network called “High-Resolution Representation (HR)” to extract features with higher quality. It repeatedly fuses the representations generated by the high-to-low subnetworks with the restoration of the low-resolution representations to the same depth and level. Second, “Feature Fusion (FF)” is added to our network to capture low-level features (e.g., lines, dots, or gradient orientation). It fuses between the features from the backbone and the segmentation models, which helps to prevent the loss of these low-level features. Finally, “Depthwise Atrous Convolution (DA)” is introduced to refine the extracted features by using four multi-resolution layers in collaboration with a dilated convolution strategy. The experiment was conducted on three data sets: two private corpora from Landsat-8 satellite and one public benchmark from the “ISPRS Vaihingen” challenge. There are two baseline models: the Deep Encoder-Decoder Network (DCED) and our previous model. The results show that the proposed model significantly outperforms all baselines. It is the winner in all data sets and exceeds more than 90% of F 1 : 0.9114, 0.9362, and 0.9111 in two Landsat-8 and ISPRS Vaihingen data sets, respectively. Furthermore, it achieves an accuracy beyond 90% on almost all classes.

Download Full-text

Dual Attention Feature Fusion and Adaptive Context for Accurate Segmentation of Very High-Resolution Remote Sensing Images

Remote Sensing ◽

10.3390/rs13183715 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3715

Author(s):

Hao Shi ◽

Jiahe Fan ◽

Yupei Wang ◽

Liang Chen

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Land Cover ◽

Feature Fusion ◽

Semantic Segmentation ◽

Land Cover Classification ◽

Contextual Cues ◽

Remote Sensing Images ◽

Object Boundary ◽

Convolutional Network

Land cover classification of high-resolution remote sensing images aims to obtain pixel-level land cover understanding, which is often modeled as semantic segmentation of remote sensing images. In recent years, convolutional network (CNN)-based land cover classification methods have achieved great advancement. However, previous methods fail to generate fine segmentation results, especially for the object boundary pixels. In order to obtain boundary-preserving predictions, we first propose to incorporate spatially adapting contextual cues. In this way, objects with similar appearance can be effectively distinguished with the extracted global contextual cues, which are very helpful to identify pixels near object boundaries. On this basis, low-level spatial details and high-level semantic cues are effectively fused with the help of our proposed dual attention mechanism. Concretely, when fusing multi-level features, we utilize the dual attention feature fusion module based on both spatial and channel attention mechanisms to relieve the influence of the large gap, and further improve the segmentation accuracy of pixels near object boundaries. Extensive experiments were carried out on the ISPRS 2D Semantic Labeling Vaihingen data and GaoFen-2 data to demonstrate the effectiveness of our proposed method. Our method achieves better performance compared with other state-of-the-art methods.

Download Full-text

Deep Feature Fusion with Integration of Residual Connection and Attention Model for Classification of VHR Remote Sensing Images

Remote Sensing ◽

10.3390/rs11131617 ◽

2019 ◽

Vol 11 (13) ◽

pp. 1617 ◽

Cited By ~ 5

Author(s):

Jicheng Wang ◽

Li Shen ◽

Wenfan Qiao ◽

Yanshuai Dai ◽

Zhilin Li

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Learning Ability ◽

Remote Sensing Images ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Attention Model ◽

Low Level ◽

Deep Feature

The classification of very-high-resolution (VHR) remote sensing images is essential in many applications. However, high intraclass and low interclass variations in these kinds of images pose serious challenges. Fully convolutional network (FCN) models, which benefit from a powerful feature learning ability, have shown impressive performance and great potential. Nevertheless, only classification results with coarse resolution can be obtained from the original FCN method. Deep feature fusion is often employed to improve the resolution of outputs. Existing strategies for such fusion are not capable of properly utilizing the low-level features and considering the importance of features at different scales. This paper proposes a novel, end-to-end, fully convolutional network to integrate a multiconnection ResNet model and a class-specific attention model into a unified framework to overcome these problems. The former fuses multilevel deep features without introducing any redundant information from low-level features. The latter can learn the contributions from different features of each geo-object at each scale. Extensive experiments on two open datasets indicate that the proposed method can achieve class-specific scale-adaptive classification results and it outperforms other state-of-the-art methods. The results were submitted to the International Society for Photogrammetry and Remote Sensing (ISPRS) online contest for comparison with more than 50 other methods. The results indicate that the proposed method (ID: SWJ_2) ranks #1 in terms of overall accuracy, even though no additional digital surface model (DSM) data that were offered by ISPRS were used and no postprocessing was applied.

Download Full-text

Semantic Segmentation on Remotely-Sensed Images Using Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning

10.20944/preprints201812.0090.v1 ◽

2018 ◽

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Network ◽

Domain Specific ◽

Medium Resolution ◽

Remotely Sensed Images ◽

Very High ◽

Resolution Data

In remote sensing domain, it is crucial to automatically annotate semantics, e.g., river, building, forest, etc, on the raster images. Deep Convolutional Encoder Decoder (DCED) network is the state-of-the-art semantic segmentation for remotely-sensed images. However, the accuracy is still limited, since the network is not designed for remotely sensed images and the training data in this domain is deficient. In this paper, we aim to propose a novel CNN network for semantic segmentation particularly for remote sensing corpora with three main contributions. First, we propose to apply a recent CNN network call ''Global Convolutional Network (GCN)'', since it can capture different resolutions by extracting multi-scale features from different stages of the network. Also, we further enhance the network by improving its backbone using larger numbers of layers, which is suitable for medium resolution remotely sensed images. Second, ''Channel Attention'' is presented into our network in order to select most discriminative filters (features). Third, ''Domain Specific Transfer Learning'' is introduced to alleviate the scarcity issue by utilizing other remotely sensed corpora with different resolutions as pre-trained data. The experiment was then conducted on two given data sets: ($i$) medium resolution data collected from Landsat-8 satellite and ($ii$) very high resolution data called ''ISPRS Vaihingen Challenge Data Set''. The results show that our networks outperformed DCED in terms of $F1$ for 17.48% and 2.49% on medium and very high resolution corpora, respectively.

Download Full-text

Top-Down Pyramid Fusion Network for High-Resolution Remote Sensing Semantic Segmentation

Remote Sensing ◽

10.3390/rs13204159 ◽

2021 ◽

Vol 13 (20) ◽

pp. 4159

Author(s):

Yuhang Gu ◽

Jie Hao ◽

Bing Chen ◽

Hai Deng

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Fusion ◽

Semantic Segmentation ◽

Semantic Knowledge ◽

Surface Model ◽

Top Down ◽

Segmentation Accuracy ◽

Fusion Methods ◽

High Level

In recent years, high-resolution remote sensing semantic segmentation based on data fusion has gradually become a research focus in the field of land classification, which is an indispensable task of a smart city. However, the existing feature fusion methods with bottom-up structures can achieve limited fusion results. Alternatively, various auxiliary fusion modules significantly increase the complexity of the models and make the training process intolerably expensive. In this paper, we propose a new lightweight model called top-down pyramid fusion network (TdPFNet) including a multi-source feature extractor, a top-down pyramid fusion module and a decoder. It can deeply fuse features from different sources in a top-down structure using high-level semantic knowledge guiding the fusion of low-level texture information. Digital surface model (DSM) data and open street map (OSM) data are used as auxiliary inputs to the Potsdam dataset for the proposed model evaluation. Experimental results show that the network proposed in this paper not only notably improves the segmentation accuracy, but also reduces the complexity of the multi-source semantic segmentation model.

Download Full-text

Semantic Segmentation on Remotely-Sensed Images Using Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning

10.20944/preprints201812.0090.v2 ◽

2018 ◽

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Network ◽

Domain Specific ◽

Medium Resolution ◽

Remotely Sensed Images ◽

Very High ◽

Resolution Data

In remote sensing domain, it is crucial to annotate semantics, e.g., river, building, forest, etc, on the raster images. Deep Convolutional Encoder Decoder (DCED) network is the state-of-the-art semantic segmentation for remotely-sensed images. However, the accuracy is still limited, since the network is not designed for remotely sensed images and the training data in this domain is deficient. In this paper, we aim to propose a novel CNN for semantic segmentation particularly for remote sensing corpora with three main contributions. First, we propose to apply a recent CNN call ``Global Convolutional Network (GCN)'', since it can capture different resolutions by extracting multi-scale features from different stages of the network. Also, we further enhance the network by improving its backbone using larger numbers of layers, which is suitable for medium resolution remotely sensed images. Second, ``Channel Attention'' is presented into our network in order to select most discriminative filters (features). Third, ``Domain Specific Transfer Learning'' is introduced to alleviate the scarcity issue by utilizing other remotely sensed corpora with different resolutions as pre-trained data. The experiment was then conducted on two given data sets: ($i$) medium resolution data collected from Landsat-8 satellite and ($ii$) very high resolution data called ``ISPRS Vaihingen Challenge Data Set''. The results show that our networks outperformed DCED in terms of $F1$ for 17.48% and 2.49% on medium and very high resolution corpora, respectively.

Download Full-text

HA-MPPNet: Height Aware-Multi Path Parallel Network for High Spatial Resolution Remote Sensing Image Semantic Seg-Mentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100672 ◽

2021 ◽

Vol 10 (10) ◽

pp. 672

Author(s):

Suting Chen ◽

Chaoqun Wu ◽

Mithun Mukherjee ◽

Yujie Zheng

Keyword(s):

Remote Sensing ◽

Spatial Resolution ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Surface Model ◽

Semantic Features ◽

Low Level ◽

Parallel Network

Semantic segmentation of remote sensing images (RSI) plays a significant role in urban management and land cover classification. Due to the richer spatial information in the RSI, existing convolutional neural network (CNN)-based methods cannot segment images accurately and lose some edge information of objects. In addition, recent studies have shown that leveraging additional 3D geometric data with 2D appearance is beneficial to distinguish the pixels’ category. However, most of them require height maps as additional inputs, which severely limits their applications. To alleviate the above issues, we propose a height aware-multi path parallel network (HA-MPPNet). Our proposed MPPNet first obtains multi-level semantic features while maintaining the spatial resolution in each path for preserving detailed image information. Afterward, gated high-low level feature fusion is utilized to complement the lack of low-level semantics. Then, we designed the height feature decode branch to learn the height features under the supervision of digital surface model (DSM) images and used the learned embeddings to improve semantic context by height feature guide propagation. Note that our module does not need a DSM image as additional input after training and is end-to-end. Our method outperformed other state-of-the-art methods for semantic segmentation on publicly available remote sensing image datasets.

Download Full-text

Semantic segmentation of high-resolution remote sensing images using fully convolutional network with adaptive threshold

Connection Science ◽

10.1080/09540091.2018.1510902 ◽

2018 ◽

Vol 31 (2) ◽

pp. 169-184 ◽

Cited By ~ 8

Author(s):

Zhihuan Wu ◽

Yongming Gao ◽

Lei Li ◽

Junshi Xue ◽

Yuntao Li

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Adaptive Threshold ◽

Remote Sensing Images ◽

Convolutional Network ◽

Fully Convolutional Network

Download Full-text

Dual Lightweight Network with Attention and Feature Fusion for Semantic Segmentation of High-Resolution Remote Sensing Images

10.1109/igarss47720.2021.9553680 ◽

2021 ◽

Author(s):

Yijie Zhang ◽

Yulan Chen ◽

Qijun Ma ◽

Changtao He ◽

Jian Cheng

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Fusion ◽

Semantic Segmentation ◽

Remote Sensing Images

Download Full-text

Semantic Segmentation on Remotely Sensed Images Using an Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning

Remote Sensing ◽

10.3390/rs11010083 ◽

2019 ◽

Vol 11 (1) ◽

pp. 83 ◽

Cited By ~ 23

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Network ◽

Domain Specific ◽

Medium Resolution ◽

Remotely Sensed Images ◽

Very High ◽

Resolution Data

In the remote sensing domain, it is crucial to complete semantic segmentation on the raster images, e.g., river, building, forest, etc., on raster images. A deep convolutional encoder–decoder (DCED) network is the state-of-the-art semantic segmentation method for remotely sensed images. However, the accuracy is still limited, since the network is not designed for remotely sensed images and the training data in this domain is deficient. In this paper, we aim to propose a novel CNN for semantic segmentation particularly for remote sensing corpora with three main contributions. First, we propose applying a recent CNN called a global convolutional network (GCN), since it can capture different resolutions by extracting multi-scale features from different stages of the network. Additionally, we further enhance the network by improving its backbone using larger numbers of layers, which is suitable for medium resolution remotely sensed images. Second, “channel attention” is presented in our network in order to select the most discriminative filters (features). Third, “domain-specific transfer learning” is introduced to alleviate the scarcity issue by utilizing other remotely sensed corpora with different resolutions as pre-trained data. The experiment was then conducted on two given datasets: (i) medium resolution data collected from Landsat-8 satellite and (ii) very high resolution data called the ISPRS Vaihingen Challenge Dataset. The results show that our networks outperformed DCED in terms of F 1 for 17.48% and 2.49% on medium and very high resolution corpora, respectively.

Download Full-text

Semantic Segmentation on Remotely Sensed Images Using an Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning

10.20944/preprints201812.0090.v3 ◽

2019 ◽

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Network ◽

Domain Specific ◽

Medium Resolution ◽

Remotely Sensed Images ◽

Very High ◽

Resolution Data

In the remote sensing domain, it is crucial to complete semantic segmentation on the raster images, e.g., river, building, forest, etc, on raster images. A deep convolutional encoder--decoder (DCED) network is the state-of-the-art semantic segmentation method for remotely sensed images. However, the accuracy is still limited, since the network is not designed for remotely sensed images and the training data in this domain is deficient. In this paper, we aim to propose a novel CNN for semantic segmentation particularly for remote sensing corpora with three main contributions. First, we propose applying a recent CNN called a global convolutional network (GCN), since it can capture different resolutions by extracting multi-scale features from different stages of the network. Additionally, we further enhance the network by improving its backbone using larger numbers of layers, which is suitable for medium resolution remotely sensed images. Second, "channel attention'' is presented in our network in order to select the most discriminative filters (features). Third, "domain-specific transfer learning'' is introduced to alleviate the scarcity issue by utilizing other remotely sensed corpora with different resolutions as pre-trained data. The experiment was then conducted on two given datasets: (i) medium resolution data collected from Landsat-8 satellite and (ii) very high resolution data called the ISPRS Vaihingen Challenge Dataset. The results show that our networks outperformed DCED in terms of $F1$ for 17.48% and 2.49% on medium and very high resolution corpora, respectively.

Download Full-text