HA-MPPNet: Height Aware-Multi Path Parallel Network for High Spatial Resolution Remote Sensing Image Semantic Seg-Mentation

Suting Chen; Chaoqun Wu; Mithun Mukherjee; Yujie Zheng

doi:10.3390/ijgi10100672

HA-MPPNet: Height Aware-Multi Path Parallel Network for High Spatial Resolution Remote Sensing Image Semantic Seg-Mentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100672 ◽

2021 ◽

Vol 10 (10) ◽

pp. 672

Author(s):

Suting Chen ◽

Chaoqun Wu ◽

Mithun Mukherjee ◽

Yujie Zheng

Keyword(s):

Remote Sensing ◽

Spatial Resolution ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Surface Model ◽

Semantic Features ◽

Low Level ◽

Parallel Network

Semantic segmentation of remote sensing images (RSI) plays a significant role in urban management and land cover classification. Due to the richer spatial information in the RSI, existing convolutional neural network (CNN)-based methods cannot segment images accurately and lose some edge information of objects. In addition, recent studies have shown that leveraging additional 3D geometric data with 2D appearance is beneficial to distinguish the pixels’ category. However, most of them require height maps as additional inputs, which severely limits their applications. To alleviate the above issues, we propose a height aware-multi path parallel network (HA-MPPNet). Our proposed MPPNet first obtains multi-level semantic features while maintaining the spatial resolution in each path for preserving detailed image information. Afterward, gated high-low level feature fusion is utilized to complement the lack of low-level semantics. Then, we designed the height feature decode branch to learn the height features under the supervision of digital surface model (DSM) images and used the learned embeddings to improve semantic context by height feature guide propagation. Note that our module does not need a DSM image as additional input after training and is end-to-end. Our method outperformed other state-of-the-art methods for semantic segmentation on publicly available remote sensing image datasets.

Download Full-text

DFFAN: Dual Function Feature Aggregation Network for Semantic Segmentation of Land Cover

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10030125 ◽

2021 ◽

Vol 10 (3) ◽

pp. 125

Author(s):

Junqing Huang ◽

Liguo Weng ◽

Bingyu Chen ◽

Min Xia

Keyword(s):

Remote Sensing ◽

Land Cover ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Dual Function ◽

Context Information ◽

Remote Sensing Images ◽

Feature Aggregation ◽

Image Context

Analyzing land cover using remote sensing images has broad prospects, the precise segmentation of land cover is the key to the application of this technology. Nowadays, the Convolution Neural Network (CNN) is widely used in many image semantic segmentation tasks. However, existing CNN models often exhibit poor generalization ability and low segmentation accuracy when dealing with land cover segmentation tasks. To solve this problem, this paper proposes Dual Function Feature Aggregation Network (DFFAN). This method combines image context information, gathers image spatial information, and extracts and fuses features. DFFAN uses residual neural networks as backbone to obtain different dimensional feature information of remote sensing images through multiple downsamplings. This work designs Affinity Matrix Module (AMM) to obtain the context of each feature map and proposes Boundary Feature Fusion Module (BFF) to fuse the context information and spatial information of an image to determine the location distribution of each image’s category. Compared with existing methods, the proposed method is significantly improved in accuracy. Its mean intersection over union (MIoU) on the LandCover dataset reaches 84.81%.

Download Full-text

Multi-scale Adaptive Feature Fusion Network for Semantic Segmentation in Remote Sensing Images

Remote Sensing ◽

10.3390/rs12050872 ◽

2020 ◽

Vol 12 (5) ◽

pp. 872 ◽

Cited By ~ 3

Author(s):

Ronghua Shang ◽

Jiyu Zhang ◽

Licheng Jiao ◽

Yangyang Li ◽

Naresh Marturi ◽

...

Keyword(s):

Remote Sensing ◽

Multiple Scales ◽

Feature Fusion ◽

Semantic Segmentation ◽

Semantic Features ◽

Remote Sensing Images ◽

Global Features ◽

Global Average ◽

Multi Scale ◽

Context Extraction

Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high- and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.

Download Full-text

Top-Down Pyramid Fusion Network for High-Resolution Remote Sensing Semantic Segmentation

Remote Sensing ◽

10.3390/rs13204159 ◽

2021 ◽

Vol 13 (20) ◽

pp. 4159

Author(s):

Yuhang Gu ◽

Jie Hao ◽

Bing Chen ◽

Hai Deng

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Fusion ◽

Semantic Segmentation ◽

Semantic Knowledge ◽

Surface Model ◽

Top Down ◽

Segmentation Accuracy ◽

Fusion Methods ◽

High Level

In recent years, high-resolution remote sensing semantic segmentation based on data fusion has gradually become a research focus in the field of land classification, which is an indispensable task of a smart city. However, the existing feature fusion methods with bottom-up structures can achieve limited fusion results. Alternatively, various auxiliary fusion modules significantly increase the complexity of the models and make the training process intolerably expensive. In this paper, we propose a new lightweight model called top-down pyramid fusion network (TdPFNet) including a multi-source feature extractor, a top-down pyramid fusion module and a decoder. It can deeply fuse features from different sources in a top-down structure using high-level semantic knowledge guiding the fusion of low-level texture information. Digital surface model (DSM) data and open street map (OSM) data are used as auxiliary inputs to the Potsdam dataset for the proposed model evaluation. Experimental results show that the network proposed in this paper not only notably improves the segmentation accuracy, but also reduces the complexity of the multi-source semantic segmentation model.

Download Full-text

Full Convolutional Neural Network Based on Multi-Scale Feature Fusion for the Class Imbalance Remote Sensing Image Classification

Remote Sensing ◽

10.3390/rs12213547 ◽

2020 ◽

Vol 12 (21) ◽

pp. 3547 ◽

Cited By ~ 2

Author(s):

Yuanyuan Ren ◽

Xianfeng Zhang ◽

Yongjian Ma ◽

Qiyuan Yang ◽

Chuanjian Wang ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

High Resolution ◽

Spatial Information ◽

Feature Fusion ◽

Remote Sensing Image ◽

Machine Learning Algorithms ◽

Image Feature ◽

Remote Sensing Images ◽

Land Covers

Remote sensing image segmentation with samples imbalance is always one of the most important issues. Typically, a high-resolution remote sensing image has the characteristics of high spatial resolution and low spectral resolution, complex large-scale land covers, small class differences for some land covers, vague foreground, and imbalanced distribution of samples. However, traditional machine learning algorithms have limitations in deep image feature extraction and dealing with sample imbalance issue. In the paper, we proposed an improved full-convolution neural network, called DeepLab V3+, with loss function based solution of samples imbalance. In addition, we select Sentinel-2 remote sensing images covering the Yuli County, Bayingolin Mongol Autonomous Prefecture, Xinjiang Uygur Autonomous Region, China as data sources, then a typical region image dataset is built by data augmentation. The experimental results show that the improved DeepLab V3+ model can not only utilize the spectral information of high-resolution remote sensing images, but also consider its rich spatial information. The classification accuracy of the proposed method on the test dataset reaches 97.97%. The mean Intersection-over-Union reaches 87.74%, and the Kappa coefficient 0.9587. The work provides methodological guidance to sample imbalance correction, and the established data resource can be a reference to further study in the future.

Download Full-text

Semantic Labeling in Remote Sensing Corpora Using Feature Fusion-Based Enhanced Global Convolutional Network with High-Resolution Representations and Depthwise Atrous Convolution

Remote Sensing ◽

10.3390/rs12081233 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1233 ◽

Cited By ~ 2

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Fusion ◽

Semantic Segmentation ◽

Route Optimization ◽

Landsat 8 ◽

Data Sets ◽

Convolutional Network ◽

Low Level ◽

Backbone Network

One of the fundamental tasks in remote sensing is the semantic segmentation on the aerial and satellite images. It plays a vital role in applications, such as agriculture planning, map updates, route optimization, and navigation. The state-of-the-art model is the Enhanced Global Convolutional Network (GCN152-TL-A) from our previous work. It composes two main components: (i) the backbone network to extract features and ( i i ) the segmentation network to annotate labels. However, the accuracy can be further improved, since the deep learning network is not designed for recovering low-level features (e.g., river, low vegetation). In this paper, we aim to improve the semantic segmentation network in three aspects, designed explicitly for the remotely sensed domain. First, we propose to employ a modern backbone network called “High-Resolution Representation (HR)” to extract features with higher quality. It repeatedly fuses the representations generated by the high-to-low subnetworks with the restoration of the low-resolution representations to the same depth and level. Second, “Feature Fusion (FF)” is added to our network to capture low-level features (e.g., lines, dots, or gradient orientation). It fuses between the features from the backbone and the segmentation models, which helps to prevent the loss of these low-level features. Finally, “Depthwise Atrous Convolution (DA)” is introduced to refine the extracted features by using four multi-resolution layers in collaboration with a dilated convolution strategy. The experiment was conducted on three data sets: two private corpora from Landsat-8 satellite and one public benchmark from the “ISPRS Vaihingen” challenge. There are two baseline models: the Deep Encoder-Decoder Network (DCED) and our previous model. The results show that the proposed model significantly outperforms all baselines. It is the winner in all data sets and exceeds more than 90% of F 1 : 0.9114, 0.9362, and 0.9111 in two Landsat-8 and ISPRS Vaihingen data sets, respectively. Furthermore, it achieves an accuracy beyond 90% on almost all classes.

Download Full-text

Encoder- and Decoder-Based Networks Using Multiscale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation

IEEE Geoscience and Remote Sensing Letters ◽

10.1109/lgrs.2020.2998680 ◽

2020 ◽

pp. 1-5

Author(s):

Yang Wang ◽

Zhaochen Sun ◽

Wei Zhao

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Semantic Segmentation ◽

Remote Sensing Image

Download Full-text

SiameseDenseU-Net-based Semantic Segmentation of Urban Remote Sensing Images

Mathematical Problems in Engineering ◽

10.1155/2020/1515630 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Rongsheng Dong ◽

Lulu Bai ◽

Fengying Li

Keyword(s):

Remote Sensing ◽

Semantic Segmentation ◽

Image Features ◽

Surface Model ◽

Median Frequency ◽

Semantic Features ◽

Small Target ◽

Remote Sensing Images ◽

Contrast Model ◽

Urban Remote Sensing

Boundary pixel blur and category imbalance are common problems that occur during semantic segmentation of urban remote sensing images. Inspired by DenseU-Net, this paper proposes a new end-to-end network—SiameseDenseU-Net. First, the network simultaneously uses both true orthophoto (TOP) images and their corresponding normalized digital surface model (nDSM) as the input of the network structure. The deep image features are extracted in parallel by downsampling blocks. Information such as shallow textures and high-level abstract semantic features are fused throughout the connected channels. The features extracted by the two parallel processing chains are then fused. Finally, a softmax layer is used to perform prediction to generate dense label maps. Experiments on the Vaihingen dataset show that SiameseDenseU-Net improves the F1-score by 8.2% and 7.63% compared with the Hourglass-ShapeNetwork (HSN) model and with the U-Net model. Regarding the boundary pixels, when using the same focus loss function based on median frequency balance weighting, compared with the original DenseU-Net, the small-target “car” category F1-score of SiameseDenseU-Net improved by 0.92%. The overall accuracy and the average F1-score also improved to varying degrees. The proposed SiameseDenseU-Net is better at identifying small-target categories and boundary pixels, and it is numerically and visually superior to the contrast model.

Download Full-text

Real-Time Dense Semantic Labeling with Dual-Path Framework for High-Resolution Remote Sensing Image

Remote Sensing ◽

10.3390/rs11243020 ◽

2019 ◽

Vol 11 (24) ◽

pp. 3020 ◽

Cited By ~ 2

Author(s):

Yuhao Wang ◽

Chen Chen ◽

Meng Ding ◽

Jiangyun Li

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Spatial Information ◽

Feature Fusion ◽

Remote Sensing Image ◽

Light Weight ◽

Fiber Network ◽

Recent Success ◽

Semantic Labeling ◽

Learning Procedure

Dense semantic labeling plays a pivotal role in high-resolution remote sensing image research. It provides pixel-level classification which is crucial in land cover mapping and urban planning. With the recent success of the convolutional neural network (CNN), accuracy has been greatly improved by previous works. However, most networks boost performance by involving too many parameters and computational overheads, which results in more inference time and hardware resources, while some attempts with light-weight networks do not achieve satisfactory results due to the insufficient feature extraction ability. In this work, we propose an efficient light-weight CNN based on dual-path architecture to address this issue. Our model utilizes three convolution layers as the spatial path to enhance the extraction of spatial information. Meanwhile, we develop the context path with the multi-fiber network (MFNet) followed by the pyramid pooling module (PPM) to obtain a sufficient receptive field. On top of these two paths, we adopt the channel attention block to refine the features from the context path and apply a feature fusion module to combine spatial information with context information. Moreover, a weighted cascade loss function is employed to enhance the learning procedure. With all these components, the performance can be significantly improved. Experiments on the Potsdam and Vaihingen datasets demonstrate that our network performs better than other light-weight networks, even some classic networks. Compared to the state-of-the-art U-Net, our model achieves higher accuracy on the two datasets with 2.5 times less network parameters and 22 times less computational floating point operations (FLOPs).

Download Full-text