Attention-Based Context Aware Network for Semantic Comprehension of Aerial Scenery

Weipeng Shi; Wenhu Qin; Zhonghua Yun; Peng Ping; Kaiyang Wu; Yuke Qu

doi:10.3390/s21061983

Attention-Based Context Aware Network for Semantic Comprehension of Aerial Scenery

Sensors ◽

10.3390/s21061983 ◽

2021 ◽

Vol 21 (6) ◽

pp. 1983

Author(s):

Weipeng Shi ◽

Wenhu Qin ◽

Zhonghua Yun ◽

Peng Ping ◽

Kaiyang Wu ◽

...

Keyword(s):

High Resolution ◽

Semantic Segmentation ◽

Aerial Images ◽

Aerial Image ◽

Convolutional Network ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Semantic Labeling ◽

Autonomous Cars ◽

High Resolution Images

It is essential for researchers to have a proper interpretation of remote sensing images (RSIs) and precise semantic labeling of their component parts. Although FCN (Fully Convolutional Networks)-like deep convolutional network architectures have been widely applied in the perception of autonomous cars, there are still two challenges in the semantic segmentation of RSIs. The first is to identify details in high-resolution images with complex scenes and to solve the class-mismatch issues; the second is to capture the edge of objects finely without being confused by the surroundings. HRNET has the characteristics of maintaining high-resolution representation by fusing feature information with parallel multi-resolution convolution branches. We adopt HRNET as a backbone and propose to incorporate the Class-Oriented Region Attention Module (CRAM) and Class-Oriented Context Fusion Module (CCFM) to analyze the relationships between classes and patch regions and between classes and local or global pixels, respectively. Thus, the perception capability of the model for the detailed part in the aerial image can be enhanced. We leverage these modules to develop an end-to-end semantic segmentation model for aerial images and validate it on the ISPRS Potsdam and Vaihingen datasets. The experimental results show that our model improves the baseline accuracy and outperforms some commonly used CNN architectures.

Download Full-text

Semantic labeling of high-resolution aerial images using an ensemble of fully convolutional networks

Journal of Applied Remote Sensing ◽

10.1117/1.jrs.11.042617 ◽

2017 ◽

Vol 11 (04) ◽

pp. 1 ◽

Cited By ~ 7

Author(s):

Xiaofeng Sun ◽

Shuhan Shen ◽

Xiangguo Lin ◽

Zhanyi Hu

Keyword(s):

High Resolution ◽

Aerial Images ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Semantic Labeling

Download Full-text

Fully Convolutional Networks for Semantic Segmentation of Very High Resolution Remotely Sensed Images Combined With DSM

IEEE Geoscience and Remote Sensing Letters ◽

10.1109/lgrs.2018.2795531 ◽

2018 ◽

Vol 15 (3) ◽

pp. 474-478 ◽

Cited By ~ 51

Author(s):

Weiwei Sun ◽

Ruisheng Wang

Keyword(s):

High Resolution ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Remotely Sensed Images ◽

Very High

Download Full-text

A Dual-Path and Lightweight Convolutional Neural Network for High-Resolution Aerial Image Segmentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8120582 ◽

2019 ◽

Vol 8 (12) ◽

pp. 582 ◽

Cited By ~ 6

Author(s):

Gang Zhang ◽

Tao Lei ◽

Yi Cui ◽

Ping Jiang

Keyword(s):

Neural Network ◽

Image Segmentation ◽

High Resolution ◽

Convolutional Neural Network ◽

Feature Learning ◽

Semantic Segmentation ◽

Aerial Images ◽

Aerial Image ◽

Sensing Applications ◽

Edge Path

Semantic segmentation on high-resolution aerial images plays a significant role in many remote sensing applications. Although the Deep Convolutional Neural Network (DCNN) has shown great performance in this task, it still faces the following two challenges: intra-class heterogeneity and inter-class homogeneity. To overcome these two problems, a novel dual-path DCNN, which contains a spatial path and an edge path, is proposed for high-resolution aerial image segmentation. The spatial path, which combines the multi-level and global context features to encode the local and global information, is used to address the intra-class heterogeneity challenge. For inter-class homogeneity problem, a Holistically-nested Edge Detection (HED)-like edge path is employed to detect the semantic boundaries for the guidance of feature learning. Furthermore, we improve the computational efficiency of the network by employing the backbone of MobileNetV2. We enhance the performance of MobileNetV2 with two modifications: (1) replacing the standard convolution in the last four Bottleneck Residual Blocks (BRBs) with atrous convolution; and (2) removing the convolution stride of 2 in the first layer of BRBs 4 and 6. Experimental results on the ISPRS Vaihingen and Potsdam 2D labeling dataset show that the proposed DCNN achieved real-time inference speed on a single GPU card with better performance, compared with the state-of-the-art baselines.

Download Full-text

Deep Residual Autoencoder with Multiscaling for Semantic Segmentation of Land-Use Images

Remote Sensing ◽

10.3390/rs11182142 ◽

2019 ◽

Vol 11 (18) ◽

pp. 2142 ◽

Cited By ~ 5

Author(s):

Lianfa Li

Keyword(s):

Land Use ◽

Deep Learning ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Network ◽

Convolutional Networks ◽

Residual Learning ◽

Fully Convolutional Networks ◽

Remotely Sensed Images ◽

Real World Datasets

Semantic segmentation is a fundamental means of extracting information from remotely sensed images at the pixel level. Deep learning has enabled considerable improvements in efficiency and accuracy of semantic segmentation of general images. Typical models range from benchmarks such as fully convolutional networks, U-Net, Micro-Net, and dilated residual networks to the more recently developed DeepLab 3+. However, many of these models were originally developed for segmentation of general or medical images and videos, and are not directly relevant to remotely sensed images. The studies of deep learning for semantic segmentation of remotely sensed images are limited. This paper presents a novel flexible autoencoder-based architecture of deep learning that makes extensive use of residual learning and multiscaling for robust semantic segmentation of remotely sensed land-use images. In this architecture, a deep residual autoencoder is generalized to a fully convolutional network in which residual connections are implemented within and between all encoding and decoding layers. Compared with the concatenated shortcuts in U-Net, these residual connections reduce the number of trainable parameters and improve the learning efficiency by enabling extensive backpropagation of errors. In addition, resizing or atrous spatial pyramid pooling (ASPP) can be leveraged to capture multiscale information from the input images to enhance the robustness to scale variations. The residual learning and multiscaling strategies improve the trained model’s generalizability, as demonstrated in the semantic segmentation of land-use types in two real-world datasets of remotely sensed images. Compared with U-Net, the proposed method improves the Jaccard index (JI) or the mean intersection over union (MIoU) by 4-11% in the training phase and by 3-9% in the validation and testing phases. With its flexible deep learning architecture, the proposed approach can be easily applied for and transferred to semantic segmentation of land-use variables and other surface variables of remotely sensed images.

Download Full-text

High-Resolution Aerial Imagery Semantic Labeling with Dense Pyramid Network

Sensors ◽

10.3390/s18113774 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3774 ◽

Cited By ~ 9

Author(s):

Xuran Pan ◽

Lianru Gao ◽

Bing Zhang ◽

Fan Yang ◽

Wenzhi Liao

Keyword(s):

High Resolution ◽

Class Imbalance ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Aerial Images ◽

Sensor Data ◽

Median Frequency ◽

Feature Maps ◽

Class Imbalance Problem ◽

Semantic Labeling

Semantic segmentation of high-resolution aerial images is of great importance in certain fields, but the increasing spatial resolution brings large intra-class variance and small inter-class differences that can lead to classification ambiguities. Based on high-level contextual features, the deep convolutional neural network (DCNN) is an effective method to deal with semantic segmentation of high-resolution aerial imagery. In this work, a novel dense pyramid network (DPN) is proposed for semantic segmentation. The network starts with group convolutions to deal with multi-sensor data in channel wise to extract feature maps of each channel separately; by doing so, more information from each channel can be preserved. This process is followed by the channel shuffle operation to enhance the representation ability of the network. Then, four densely connected convolutional blocks are utilized to both extract and take full advantage of features. The pyramid pooling module combined with two convolutional layers are set to fuse multi-resolution and multi-sensor features through an effective global scenery prior manner, producing the probability graph for each class. Moreover, the median frequency balanced focal loss is proposed to replace the standard cross entropy loss in the training phase to deal with the class imbalance problem. We evaluate the dense pyramid network on the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam 2D semantic labeling dataset, and the results demonstrate that the proposed framework exhibits better performances, compared to the state of the art baseline.

Download Full-text

A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-1-w1-513-2017 ◽

2017 ◽

Vol XLII-1/W1 ◽

pp. 513-517 ◽

Cited By ~ 1

Author(s):

W. Zhang ◽

H. Huang ◽

M. Schmitz ◽

X. Sun ◽

H. Wang ◽

...

Keyword(s):

Remote Sensing Data ◽

Semantic Segmentation ◽

Heterogeneous Data ◽

Fusion Model ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Semantic Labeling ◽

Depth Study ◽

Comprehensive Evaluations ◽

The Given

In recent years, the developments for Fully Convolutional Networks (FCN) have led to great improvements for semantic segmentation in various applications including fused remote sensing data. There is, however, a lack of an in-depth study inside FCN models which would lead to an understanding of the contribution of individual layers to specific classes and their sensitivity to different types of input data. In this paper, we address this problem and propose a fusion model incorporating infrared imagery and Digital Surface Models (DSM) for semantic segmentation. The goal is to utilize heterogeneous data more accurately and effectively in a single model instead of to assemble multiple models. First, the contribution and sensitivity of layers concerning the given classes are quantified by means of their recall in FCN. The contribution of different modalities on the pixel-wise prediction is then analyzed based on visualization. Finally, an optimized scheme for the fusion of layers with color and elevation information into a single FCN model is derived based on the analysis. Experiments are performed on the ISPRS Vaihingen 2D Semantic Labeling dataset. Comprehensive evaluations demonstrate the potential of the proposed approach.

Download Full-text

Semantic segmentation of very high resolution remote sensing images with residual logic deep fully convolutional networks

MIPPR 2019: Remote Sensing Image Processing, Geographic Information Systems, and Other Applications ◽

10.1117/12.2541818 ◽

2020 ◽

Author(s):

Sheng He ◽

Jin Liu

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Very High

Download Full-text

Relation Matters: Relational Context-Aware Fully Convolutional Network for Semantic Segmentation of High-Resolution Aerial Images

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2020.2979552 ◽

2020 ◽

Vol 58 (11) ◽

pp. 7557-7569 ◽

Cited By ~ 1

Author(s):

Lichao Mou ◽

Yuansheng Hua ◽

Xiao Xiang Zhu

Keyword(s):

High Resolution ◽

Semantic Segmentation ◽

Aerial Images ◽

Context Aware ◽

Convolutional Network ◽

Relational Context ◽

Fully Convolutional Network

Download Full-text

Densely Based Multi-Scale and Multi-Modal Fully Convolutional Networks for High-Resolution Remote-Sensing Image Semantic Segmentation

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2019.2906387 ◽

2019 ◽

Vol 12 (8) ◽

pp. 2612-2626 ◽

Cited By ~ 10

Author(s):

Cheng Peng ◽

Yangyang Li ◽

Licheng Jiao ◽

Yanqiao Chen ◽

Ronghua Shang

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Convolutional Networks ◽

Multi Scale ◽

Fully Convolutional Networks

Download Full-text

Learn to Extract Building Outline from Misaligned Annotation through Nearest Feature Selector

Remote Sensing ◽

10.3390/rs12172722 ◽

2020 ◽

Vol 12 (17) ◽

pp. 2722

Author(s):

Yuxuan Wang ◽

Guangming Wu ◽

Yimin Guo ◽

Yifei Huang ◽

Ryosuke Shibasaki

Keyword(s):

Large Scale ◽

Rapid Development ◽

Semantic Segmentation ◽

Loss Functions ◽

Aerial Image ◽

Building Extraction ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Feature Selector ◽

Segmentation Task

For efficient building outline extraction, many algorithms, including unsupervised or supervised, have been proposed over the past decades. In recent years, due to the rapid development of the convolutional neural networks, especially fully convolutional networks, building extraction is treated as a semantic segmentation task that deals with the extremely biased positive pixels. The state-of-the-art methods, either through direct or indirect approaches, are mainly focused on better network design. The shifts and rotations, which are coarsely presented in manually created annotations, have long been ignored. Due to the limited number of positive samples, the misalignment will significantly reduce the correctness of pixel-to-pixel loss that might lead to a gradient explosion. To overcome this, we propose a nearest feature selector (NFS) to dynamically re-align the prediction and slightly misaligned annotations. The NFS can be seamlessly appended to existing loss functions and prevent misleading by the errors or misalignment of annotations. Experiments on a large scale aerial image dataset with centered buildings and corresponding building outlines indicate that the additional NFS brings higher performance when compared to existing naive loss functions. In the classic L1 loss, the addition of NFS gains increments of 8.8% of f1-score, 8.9% of kappa coefficient, and 9.8% of Jaccard index, respectively.

Download Full-text