Top-Down Pyramid Fusion Network for High-Resolution Remote Sensing Semantic Segmentation

Yuhang Gu; Jie Hao; Bing Chen; Hai Deng

doi:10.3390/rs13204159

Top-Down Pyramid Fusion Network for High-Resolution Remote Sensing Semantic Segmentation

Remote Sensing ◽

10.3390/rs13204159 ◽

2021 ◽

Vol 13 (20) ◽

pp. 4159

Author(s):

Yuhang Gu ◽

Jie Hao ◽

Bing Chen ◽

Hai Deng

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Fusion ◽

Semantic Segmentation ◽

Semantic Knowledge ◽

Surface Model ◽

Top Down ◽

Segmentation Accuracy ◽

Fusion Methods ◽

High Level

In recent years, high-resolution remote sensing semantic segmentation based on data fusion has gradually become a research focus in the field of land classification, which is an indispensable task of a smart city. However, the existing feature fusion methods with bottom-up structures can achieve limited fusion results. Alternatively, various auxiliary fusion modules significantly increase the complexity of the models and make the training process intolerably expensive. In this paper, we propose a new lightweight model called top-down pyramid fusion network (TdPFNet) including a multi-source feature extractor, a top-down pyramid fusion module and a decoder. It can deeply fuse features from different sources in a top-down structure using high-level semantic knowledge guiding the fusion of low-level texture information. Digital surface model (DSM) data and open street map (OSM) data are used as auxiliary inputs to the Potsdam dataset for the proposed model evaluation. Experimental results show that the network proposed in this paper not only notably improves the segmentation accuracy, but also reduces the complexity of the multi-source semantic segmentation model.

Download Full-text

CCT: Conditional Co-Training for Truly Unsupervised Remote Sensing Image Segmentation in Coastal Areas

Remote Sensing ◽

10.3390/rs13173521 ◽

2021 ◽

Vol 13 (17) ◽

pp. 3521

Author(s):

Bo Fang ◽

Gang Chen ◽

Jifa Chen ◽

Guichong Ouyang ◽

Rong Kou ◽

...

Keyword(s):

Remote Sensing ◽

Image Segmentation ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Semantic Knowledge ◽

Input Image ◽

Coastal Areas ◽

Learning Technology ◽

Model Framework ◽

High Level

As the fastest growing trend in big data analysis, deep learning technology has proven to be both an unprecedented breakthrough and a powerful tool in many fields, particularly for image segmentation tasks. Nevertheless, most achievements depend on high-quality pre-labeled training samples, which are labor-intensive and time-consuming. Furthermore, different from conventional natural images, coastal remote sensing ones generally carry far more complicated and considerable land cover information, making it difficult to produce pre-labeled references for supervised image segmentation. In our research, motivated by this observation, we take an in-depth investigation on the utilization of neural networks for unsupervised learning and propose a novel method, namely conditional co-training (CCT), specifically for truly unsupervised remote sensing image segmentation in coastal areas. In our idea, a multi-model framework consisting of two parallel data streams, which are superpixel-based over-segmentation and pixel-level semantic segmentation, is proposed to simultaneously perform the pixel-level classification. The former processes the input image into multiple over-segments, providing self-constrained guidance for model training. Meanwhile, with this guidance, the latter continuously processes the input image into multi-channel response maps until the model converges. Incentivized by multiple conditional constraints, our framework learns to extract high-level semantic knowledge and produce full-resolution segmentation maps without pre-labeled ground truths. Compared to the black-box solutions in conventional supervised learning manners, this method is of stronger explainability and transparency for its specific architecture and mechanism. The experimental results on two representative real-world coastal remote sensing datasets of image segmentation and the comparison with other state-of-the-art truly unsupervised methods validate the plausible performance and excellent efficiency of our proposed CCT.

Download Full-text

Dual Attention Feature Fusion and Adaptive Context for Accurate Segmentation of Very High-Resolution Remote Sensing Images

Remote Sensing ◽

10.3390/rs13183715 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3715

Author(s):

Hao Shi ◽

Jiahe Fan ◽

Yupei Wang ◽

Liang Chen

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Land Cover ◽

Feature Fusion ◽

Semantic Segmentation ◽

Land Cover Classification ◽

Contextual Cues ◽

Remote Sensing Images ◽

Object Boundary ◽

Convolutional Network

Land cover classification of high-resolution remote sensing images aims to obtain pixel-level land cover understanding, which is often modeled as semantic segmentation of remote sensing images. In recent years, convolutional network (CNN)-based land cover classification methods have achieved great advancement. However, previous methods fail to generate fine segmentation results, especially for the object boundary pixels. In order to obtain boundary-preserving predictions, we first propose to incorporate spatially adapting contextual cues. In this way, objects with similar appearance can be effectively distinguished with the extracted global contextual cues, which are very helpful to identify pixels near object boundaries. On this basis, low-level spatial details and high-level semantic cues are effectively fused with the help of our proposed dual attention mechanism. Concretely, when fusing multi-level features, we utilize the dual attention feature fusion module based on both spatial and channel attention mechanisms to relieve the influence of the large gap, and further improve the segmentation accuracy of pixels near object boundaries. Extensive experiments were carried out on the ISPRS 2D Semantic Labeling Vaihingen data and GaoFen-2 data to demonstrate the effectiveness of our proposed method. Our method achieves better performance compared with other state-of-the-art methods.

Download Full-text

HA-MPPNet: Height Aware-Multi Path Parallel Network for High Spatial Resolution Remote Sensing Image Semantic Seg-Mentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100672 ◽

2021 ◽

Vol 10 (10) ◽

pp. 672

Author(s):

Suting Chen ◽

Chaoqun Wu ◽

Mithun Mukherjee ◽

Yujie Zheng

Keyword(s):

Remote Sensing ◽

Spatial Resolution ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Surface Model ◽

Semantic Features ◽

Low Level ◽

Parallel Network

Semantic segmentation of remote sensing images (RSI) plays a significant role in urban management and land cover classification. Due to the richer spatial information in the RSI, existing convolutional neural network (CNN)-based methods cannot segment images accurately and lose some edge information of objects. In addition, recent studies have shown that leveraging additional 3D geometric data with 2D appearance is beneficial to distinguish the pixels’ category. However, most of them require height maps as additional inputs, which severely limits their applications. To alleviate the above issues, we propose a height aware-multi path parallel network (HA-MPPNet). Our proposed MPPNet first obtains multi-level semantic features while maintaining the spatial resolution in each path for preserving detailed image information. Afterward, gated high-low level feature fusion is utilized to complement the lack of low-level semantics. Then, we designed the height feature decode branch to learn the height features under the supervision of digital surface model (DSM) images and used the learned embeddings to improve semantic context by height feature guide propagation. Note that our module does not need a DSM image as additional input after training and is end-to-end. Our method outperformed other state-of-the-art methods for semantic segmentation on publicly available remote sensing image datasets.

Download Full-text

Semantic Segmentation of Remote Sensing Image Based on Convolutional Neural Network and Mask Generation

Mathematical Problems in Engineering ◽

10.1155/2021/2472726 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Binglin Niu

Keyword(s):

Neural Network ◽

Remote Sensing ◽

High Resolution ◽

Convolutional Neural Network ◽

Semantic Segmentation ◽

Layer By Layer ◽

Foreground Object ◽

Remote Sensing Images ◽

Training Time ◽

High Level

High-resolution remote sensing images usually contain complex semantic information and confusing targets, so their semantic segmentation is an important and challenging task. To resolve the problem of inadequate utilization of multilayer features by existing methods, a semantic segmentation method for remote sensing images based on convolutional neural network and mask generation is proposed. In this method, the boundary box is used as the initial foreground segmentation profile, and the edge information of the foreground object is obtained by using the multilayer feature of the convolutional neural network. In order to obtain the rough object segmentation mask, the general shape and position of the foreground object are estimated by using the high-level features in the process of layer-by-layer iteration. Then, based on the obtained rough mask, the mask is updated layer by layer using the neural network characteristics to obtain a more accurate mask. In order to solve the difficulty of deep neural network training and the problem of degeneration after convergence, a framework based on residual learning was adopted, which can simplify the training of those very deep networks and improve the accuracy of the network. For comparison with other advanced algorithms, the proposed algorithm was tested on the Potsdam and Vaihingen datasets. Experimental results show that, compared with other algorithms, the algorithm in this article can effectively improve the overall precision of semantic segmentation of high-resolution remote sensing images and shorten the overall training time and segmentation time.

Download Full-text

Semantic Labeling in Remote Sensing Corpora Using Feature Fusion-Based Enhanced Global Convolutional Network with High-Resolution Representations and Depthwise Atrous Convolution

Remote Sensing ◽

10.3390/rs12081233 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1233 ◽

Cited By ~ 2

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Fusion ◽

Semantic Segmentation ◽

Route Optimization ◽

Landsat 8 ◽

Data Sets ◽

Convolutional Network ◽

Low Level ◽

Backbone Network

One of the fundamental tasks in remote sensing is the semantic segmentation on the aerial and satellite images. It plays a vital role in applications, such as agriculture planning, map updates, route optimization, and navigation. The state-of-the-art model is the Enhanced Global Convolutional Network (GCN152-TL-A) from our previous work. It composes two main components: (i) the backbone network to extract features and ( i i ) the segmentation network to annotate labels. However, the accuracy can be further improved, since the deep learning network is not designed for recovering low-level features (e.g., river, low vegetation). In this paper, we aim to improve the semantic segmentation network in three aspects, designed explicitly for the remotely sensed domain. First, we propose to employ a modern backbone network called “High-Resolution Representation (HR)” to extract features with higher quality. It repeatedly fuses the representations generated by the high-to-low subnetworks with the restoration of the low-resolution representations to the same depth and level. Second, “Feature Fusion (FF)” is added to our network to capture low-level features (e.g., lines, dots, or gradient orientation). It fuses between the features from the backbone and the segmentation models, which helps to prevent the loss of these low-level features. Finally, “Depthwise Atrous Convolution (DA)” is introduced to refine the extracted features by using four multi-resolution layers in collaboration with a dilated convolution strategy. The experiment was conducted on three data sets: two private corpora from Landsat-8 satellite and one public benchmark from the “ISPRS Vaihingen” challenge. There are two baseline models: the Deep Encoder-Decoder Network (DCED) and our previous model. The results show that the proposed model significantly outperforms all baselines. It is the winner in all data sets and exceeds more than 90% of F 1 : 0.9114, 0.9362, and 0.9111 in two Landsat-8 and ISPRS Vaihingen data sets, respectively. Furthermore, it achieves an accuracy beyond 90% on almost all classes.

Download Full-text

Dual Lightweight Network with Attention and Feature Fusion for Semantic Segmentation of High-Resolution Remote Sensing Images

10.1109/igarss47720.2021.9553680 ◽

2021 ◽

Author(s):

Yijie Zhang ◽

Yulan Chen ◽

Qijun Ma ◽

Changtao He ◽

Jian Cheng

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Fusion ◽

Semantic Segmentation ◽

Remote Sensing Images

Download Full-text

Conditional Generative Adversarial Network-Based Training Sample Set Improvement Model for the Semantic Segmentation of High-Resolution Remote Sensing Images

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2020.3033816 ◽

2020 ◽

pp. 1-17

Author(s):

Xin Pan ◽

Jian Zhao ◽

Jun Xu

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Training Sample ◽

Remote Sensing Images ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Sample Set

Download Full-text

Semantic Segmentation of High Resolution Remote Sensing Images with Extra Context Attention Mechanism

2020 IEEE 20th International Conference on Communication Technology (ICCT) ◽

10.1109/icct50939.2020.9295814 ◽

2020 ◽

Author(s):

Weifu Fu ◽

Qing Peng ◽

Yanxiang Gong ◽

Mei Xie ◽

Shicheng Wang ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Attention Mechanism ◽

Remote Sensing Images

Download Full-text

HRCNet: High-Resolution Context Extraction Network for Semantic Segmentation of Remote Sensing Images

Remote Sensing ◽

10.3390/rs13010071 ◽

2020 ◽

Vol 13 (1) ◽

pp. 71

Author(s):

Zhiyong Xu ◽

Weicun Zhang ◽

Tianxiang Zhang ◽

Jiangyun Li

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

High Resolution ◽

Spatial Information ◽

Semantic Segmentation ◽

Context Information ◽

Remote Sensing Images ◽

Global Context ◽

Boundary Information ◽

Extraction Stage

Semantic segmentation is a significant method in remote sensing image (RSIs) processing and has been widely used in various applications. Conventional convolutional neural network (CNN)-based semantic segmentation methods are likely to lose the spatial information in the feature extraction stage and usually pay little attention to global context information. Moreover, the imbalance of category scale and uncertain boundary information meanwhile exists in RSIs, which also brings a challenging problem to the semantic segmentation task. To overcome these problems, a high-resolution context extraction network (HRCNet) based on a high-resolution network (HRNet) is proposed in this paper. In this approach, the HRNet structure is adopted to keep the spatial information. Moreover, the light-weight dual attention (LDA) module is designed to obtain global context information in the feature extraction stage and the feature enhancement feature pyramid (FEFP) structure is promoted and employed to fuse the contextual information of different scales. In addition, to achieve the boundary information, we design the boundary aware (BA) module combined with the boundary aware loss (BAloss) function. The experimental results evaluated on Potsdam and Vaihingen datasets show that the proposed approach can significantly improve the boundary and segmentation performance up to 92.0% and 92.3% on overall accuracy scores, respectively. As a consequence, it is envisaged that the proposed HRCNet model will be an advantage in remote sensing images segmentation.

Download Full-text

DFFAN: Dual Function Feature Aggregation Network for Semantic Segmentation of Land Cover

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10030125 ◽

2021 ◽

Vol 10 (3) ◽

pp. 125

Author(s):

Junqing Huang ◽

Liguo Weng ◽

Bingyu Chen ◽

Min Xia

Keyword(s):

Remote Sensing ◽

Land Cover ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Dual Function ◽

Context Information ◽

Remote Sensing Images ◽

Feature Aggregation ◽

Image Context

Analyzing land cover using remote sensing images has broad prospects, the precise segmentation of land cover is the key to the application of this technology. Nowadays, the Convolution Neural Network (CNN) is widely used in many image semantic segmentation tasks. However, existing CNN models often exhibit poor generalization ability and low segmentation accuracy when dealing with land cover segmentation tasks. To solve this problem, this paper proposes Dual Function Feature Aggregation Network (DFFAN). This method combines image context information, gathers image spatial information, and extracts and fuses features. DFFAN uses residual neural networks as backbone to obtain different dimensional feature information of remote sensing images through multiple downsamplings. This work designs Affinity Matrix Module (AMM) to obtain the context of each feature map and proposes Boundary Feature Fusion Module (BFF) to fuse the context information and spatial information of an image to determine the location distribution of each image’s category. Compared with existing methods, the proposed method is significantly improved in accuracy. Its mean intersection over union (MIoU) on the LandCover dataset reaches 84.81%.

Download Full-text