C3Net: Cross-Modal Feature Recalibrated, Cross-Scale Semantic Aggregated and Compact Network for Semantic Segmentation of Multi-Modal High-Resolution Aerial Images

Zhiying Cao; Wenhui Diao; Xian Sun; Xiaode Lyu; Menglong Yan; Kun Fu

doi:10.3390/rs13030528

C3Net: Cross-Modal Feature Recalibrated, Cross-Scale Semantic Aggregated and Compact Network for Semantic Segmentation of Multi-Modal High-Resolution Aerial Images

Remote Sensing ◽

10.3390/rs13030528 ◽

2021 ◽

Vol 13 (3) ◽

pp. 528

Author(s):

Zhiying Cao ◽

Wenhui Diao ◽

Xian Sun ◽

Xiaode Lyu ◽

Menglong Yan ◽

...

Keyword(s):

Remote Sensing ◽

Image Interpretation ◽

Model Performance ◽

Semantic Segmentation ◽

Aerial Images ◽

Superior Performance ◽

Model Parameters ◽

Complementary Information ◽

Modal Data ◽

The One

Semantic segmentation of multi-modal remote sensing images is an important branch of remote sensing image interpretation. Multi-modal data has been proven to provide rich complementary information to deal with complex scenes. In recent years, semantic segmentation based on deep learning methods has made remarkable achievements. It is common to simply concatenate multi-modal data or use parallel branches to extract multi-modal features separately. However, most existing works ignore the effects of noise and redundant features from different modalities, which may not lead to satisfactory results. On the one hand, existing networks do not learn the complementary information of different modalities and suppress the mutual interference between different modalities, which may lead to a decrease in segmentation accuracy. On the other hand, the introduction of multi-modal data greatly increases the running time of the pixel-level dense prediction. In this work, we propose an efficient C3Net that strikes a balance between speed and accuracy. More specifically, C3Net contains several backbones for extracting features of different modalities. Then, a plug-and-play module is designed to effectively recalibrate and aggregate multi-modal features. In order to reduce the number of model parameters while remaining the model performance, we redesign the semantic contextual extraction module based on the lightweight convolutional groups. Besides, a multi-level knowledge distillation strategy is proposed to improve the performance of the compact model. Experiments on ISPRS Vaihingen dataset demonstrate the superior performance of C3Net with 15× fewer FLOPs than the state-of-the-art baseline network while providing comparable overall accuracy.

Download Full-text

Mapping the Unseen: Exploiting Super-Resolution for Semantic Segmentation in Low-Resolution Images

10.5753/sibgrapi.est.2020.12987 ◽

2020 ◽

Author(s):

Matheus B. Pereira ◽

Jefersson Alex Dos Santos

Keyword(s):

Remote Sensing ◽

Pattern Recognition ◽

Super Resolution ◽

Remote Sensing Data ◽

Semantic Segmentation ◽

The Other ◽

Aerial Imagery ◽

Aerial Images ◽

Remote Sensing Images ◽

Low Resolution

High-resolution aerial images are usually not accessible or affordable. On the other hand, low-resolution remote sensing data is easily found in public open repositories. The problem is that the low-resolution representation can compromise pattern recognition algorithms, especially semantic segmentation. In this M.Sc. dissertation1 , we design two frameworks in order to evaluate the effectiveness of super-resolution in the semantic segmentation of low-resolution remote sensing images. We carried out an extensive set of experiments on different remote sensing datasets. The results show that super-resolution is effective to improve semantic segmentation performance on low-resolution aerial imagery, outperforming unsupervised interpolation and achieving semantic segmentation results comparable to highresolution data.

Download Full-text

USING SEMANTICALLY PAIRED IMAGES TO IMPROVE DOMAIN ADAPTATION FOR THE SEMANTIC SEGMENTATION OF AERIAL IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-483-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 483-492

Author(s):

D. Gritzner ◽

J. Ostermann

Keyword(s):

Time Window ◽

Domain Adaptation ◽

Geographical Area ◽

Model Performance ◽

Ground Truth ◽

Semantic Segmentation ◽

Training Data ◽

Aerial Images ◽

Target Domain ◽

Training Examples

Abstract. Modern machine learning, especially deep learning, which is used in a variety of applications, requires a lot of labelled data for model training. Having an insufficient amount of training examples leads to models which do not generalize well to new input instances. This is a particular significant problem for tasks involving aerial images: often training data is only available for a limited geographical area and a narrow time window, thus leading to models which perform poorly in different regions, at different times of day, or during different seasons. Domain adaptation can mitigate this issue by using labelled source domain training examples and unlabeled target domain images to train a model which performs well on both domains. Modern adversarial domain adaptation approaches use unpaired data. We propose using pairs of semantically similar images, i.e., whose segmentations are accurate predictions of each other, for improved model performance. In this paper we show that, as an upper limit based on ground truth, using semantically paired aerial images during training almost always increases model performance with an average improvement of 4.2% accuracy and .036 mean intersection-over-union (mIoU). Using a practical estimate of semantic similarity, we still achieve improvements in more than half of all cases, with average improvements of 2.5% accuracy and .017 mIoU in those cases.

Download Full-text

Water Areas Segmentation from Remote Sensing Images Using a Separable Residual SegNet Network

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9040256 ◽

2020 ◽

Vol 9 (4) ◽

pp. 256 ◽

Cited By ~ 1

Author(s):

Liguo Weng ◽

Yiming Xu ◽

Min Xia ◽

Yonghong Zhang ◽

Jia Liu ◽

...

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Spatial Information ◽

Global Climate ◽

Semantic Segmentation ◽

Water Area ◽

Remote Sensing Images ◽

The One ◽

High Level ◽

Almost All

Changes on lakes and rivers are of great significance for the study of global climate change. Accurate segmentation of lakes and rivers is critical to the study of their changes. However, traditional water area segmentation methods almost all share the following deficiencies: high computational requirements, poor generalization performance, and low extraction accuracy. In recent years, semantic segmentation algorithms based on deep learning have been emerging. Addressing problems associated to a very large number of parameters, low accuracy, and network degradation during training process, this paper proposes a separable residual SegNet (SR-SegNet) to perform the water area segmentation using remote sensing images. On the one hand, without compromising the ability of feature extraction, the problem of network degradation is alleviated by adding modified residual blocks into the encoder, the number of parameters is limited by introducing depthwise separable convolutions, and the ability of feature extraction is improved by using dilated convolutions to expand the receptive field. On the other hand, SR-SegNet removes the convolution layers with relatively more convolution kernels in the encoding stage, and uses the cascading method to fuse the low-level and high-level features of the image. As a result, the whole network can obtain more spatial information. Experimental results show that the proposed method exhibits significant improvements over several traditional methods, including FCN, DeconvNet, and SegNet.

Download Full-text

Multiscale Road Extraction in Remote Sensing Images

Computational Intelligence and Neuroscience ◽

10.1155/2019/2373798 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Aziguli Wulamu ◽

Zuxian Shi ◽

Dezheng Zhang ◽

Zheyu He

Keyword(s):

Remote Sensing ◽

Network Architecture ◽

Semantic Segmentation ◽

Road Extraction ◽

Remote Sensing Images ◽

The Road ◽

Proposed Model ◽

Different Types ◽

Spatial Pyramid Pooling ◽

The One

Recent advances in convolutional neural networks (CNNs) have shown impressive results in semantic segmentation. Among the successful CNN-based methods, U-Net has achieved exciting performance. In this paper, we proposed a novel network architecture based on U-Net and atrous spatial pyramid pooling (ASPP) to deal with the road extraction task in the remote sensing field. On the one hand, U-Net structure can effectively extract valuable features; on the other hand, ASPP is able to utilize multiscale context information in remote sensing images. Compared to the baseline, this proposed model has improved the pixelwise mean Intersection over Union (mIoU) of 3 points. Experimental results show that the proposed network architecture can deal with different types of road surface extraction tasks under various terrains in Yinchuan city, solve the road connectivity problem to some extent, and has certain tolerance to shadows and occlusion.

Download Full-text

Class-Wise Fully Convolutional Network for Semantic Segmentation of Remote Sensing Images

Remote Sensing ◽

10.3390/rs13163211 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3211

Author(s):

Tian Tian ◽

Zhengquan Chu ◽

Qian Hu ◽

Li Ma

Keyword(s):

Remote Sensing ◽

Image Interpretation ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Feature Maps ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Semantic Labeling ◽

Benchmark Datasets ◽

Semantic Label

Semantic segmentation is a fundamental task in remote sensing image interpretation, which aims to assign a semantic label for every pixel in the given image. Accurate semantic segmentation is still challenging due to the complex distributions of various ground objects. With the development of deep learning, a series of segmentation networks represented by fully convolutional network (FCN) has made remarkable progress on this problem, but the segmentation accuracy is still far from expectations. This paper focuses on the importance of class-specific features of different land cover objects, and presents a novel end-to-end class-wise processing framework for segmentation. The proposed class-wise FCN (C-FCN) is shaped in the form of an encoder-decoder structure with skip-connections, in which the encoder is shared to produce general features for all categories and the decoder is class-wise to process class-specific features. To be detailed, class-wise transition (CT), class-wise up-sampling (CU), class-wise supervision (CS), and class-wise classification (CC) modules are designed to achieve the class-wise transfer, recover the resolution of class-wise feature maps, bridge the encoder and modified decoder, and implement class-wise classifications, respectively. Class-wise and group convolutions are adopted in the architecture with regard to the control of parameter numbers. The method is tested on the public ISPRS 2D semantic labeling benchmark datasets. Experimental results show that the proposed C-FCN significantly improves the segmentation performances compared with many state-of-the-art FCN-based networks, revealing its potentials on accurate segmentation of complex remote sensing images.

Download Full-text

SDFCNv2: An Improved FCN Framework for Remote Sensing Images Semantic Segmentation

Remote Sensing ◽

10.3390/rs13234902 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4902

Author(s):

Guanzhou Chen ◽

Xiaoliang Tan ◽

Beibei Guo ◽

Kun Zhu ◽

Puyun Liao ◽

...

Keyword(s):

Remote Sensing ◽

Receptive Field ◽

Data Augmentation ◽

Semantic Segmentation ◽

Decision Fusion ◽

Majority Voting ◽

Model Parameters ◽

Natural Scene ◽

Segmentation Framework ◽

Natural Scene Images

Semantic segmentation is a fundamental task in remote sensing image analysis (RSIA). Fully convolutional networks (FCNs) have achieved state-of-the-art performance in the task of semantic segmentation of natural scene images. However, due to distinctive differences between natural scene images and remotely-sensed (RS) images, FCN-based semantic segmentation methods from the field of computer vision cannot achieve promising performances on RS images without modifications. In previous work, we proposed an RS image semantic segmentation framework SDFCNv1, combined with a majority voting postprocessing method. Nevertheless, it still has some drawbacks, such as small receptive field and large number of parameters. In this paper, we propose an improved semantic segmentation framework SDFCNv2 based on SDFCNv1, to conduct optimal semantic segmentation on RS images. We first construct a novel FCN model with hybrid basic convolutional (HBC) blocks and spatial-channel-fusion squeeze-and-excitation (SCFSE) modules, which occupies a larger receptive field and fewer network model parameters. We also put forward a data augmentation method based on spectral-specific stochastic-gamma-transform-based (SSSGT-based) during the model training process to improve generalizability of our model. Besides, we design a mask-weighted voting decision fusion postprocessing algorithm for image segmentation on overlarge RS images. We conducted several comparative experiments on two public datasets and a real surveying and mapping dataset. Extensive experimental results demonstrate that compared with the SDFCNv1 framework, our SDFCNv2 framework can increase the mIoU metric by up to 5.22% while only using about half of parameters.

Download Full-text

SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iii-3-473-2016 ◽

2016 ◽

Vol III-3 ◽

pp. 473-480 ◽

Cited By ~ 2

Author(s):

D. Marmanis ◽

J. D. Wegner ◽

S. Galliani ◽

K. Schindler ◽

M. Datcu ◽

...

Keyword(s):

Visual Recognition ◽

Image Interpretation ◽

Semantic Segmentation ◽

Aerial Images ◽

Range Data ◽

Deep Convolutional Neural Networks ◽

Network Layers ◽

Full Resolution ◽

Aggregate Information ◽

Semantic Labeling

This paper describes a deep learning approach to semantic segmentation of very high resolution (aerial) images. Deep neural architectures hold the promise of end-to-end learning from raw images, making heuristic feature design obsolete. Over the last decade this idea has seen a revival, and in recent years deep convolutional neural networks (CNNs) have emerged as the method of choice for a range of image interpretation tasks like visual recognition and object detection. Still, standard CNNs do not lend themselves to per-pixel semantic segmentation, mainly because one of their fundamental principles is to gradually aggregate information over larger and larger image regions, making it hard to disentangle contributions from different pixels. Very recently two extensions of the CNN framework have made it possible to trace the semantic information back to a precise pixel position: deconvolutional network layers undo the spatial downsampling, and Fully Convolution Networks (FCNs) modify the fully connected classification layers of the network in such a way that the location of individual activations remains explicit. We design a FCN which takes as input intensity and range data and, with the help of aggressive deconvolution and recycling of early network layers, converts them into a pixelwise classification at full resolution. We discuss design choices and intricacies of such a network, and demonstrate that an ensemble of several networks achieves excellent results on challenging data such as the <i>ISPRS semantic labeling benchmark</i>, using only the raw data as input.

Download Full-text

Multi-Object Segmentation in Complex Urban Scenes from High-Resolution Remote Sensing Data

Remote Sensing ◽

10.3390/rs13183710 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3710

Author(s):

Abolfazl Abdollahi ◽

Biswajeet Pradhan ◽

Nagesh Shukla ◽

Subrata Chakraborty ◽

Abdullah Alamri

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Urban Areas ◽

Object Segmentation ◽

Remote Sensing Data ◽

Semantic Segmentation ◽

Aerial Images ◽

Urban Scenes ◽

Sensing Data ◽

Boundary Information

Terrestrial features extraction, such as roads and buildings from aerial images using an automatic system, has many usages in an extensive range of fields, including disaster management, change detection, land cover assessment, and urban planning. This task is commonly tough because of complex scenes, such as urban scenes, where buildings and road objects are surrounded by shadows, vehicles, trees, etc., which appear in heterogeneous forms with lower inter-class and higher intra-class contrasts. Moreover, such extraction is time-consuming and expensive to perform by human specialists manually. Deep convolutional models have displayed considerable performance for feature segmentation from remote sensing data in the recent years. However, for the large and continuous area of obstructions, most of these techniques still cannot detect road and building well. Hence, this work’s principal goal is to introduce two novel deep convolutional models based on UNet family for multi-object segmentation, such as roads and buildings from aerial imagery. We focused on buildings and road networks because these objects constitute a huge part of the urban areas. The presented models are called multi-level context gating UNet (MCG-UNet) and bi-directional ConvLSTM UNet model (BCL-UNet). The proposed methods have the same advantages as the UNet model, the mechanism of densely connected convolutions, bi-directional ConvLSTM, and squeeze and excitation module to produce the segmentation maps with a high resolution and maintain the boundary information even under complicated backgrounds. Additionally, we implemented a basic efficient loss function called boundary-aware loss (BAL) that allowed a network to concentrate on hard semantic segmentation regions, such as overlapping areas, small objects, sophisticated objects, and boundaries of objects, and produce high-quality segmentation maps. The presented networks were tested on the Massachusetts building and road datasets. The MCG-UNet improved the average F1 accuracy by 1.85%, and 1.19% and 6.67% and 5.11% compared with UNet and BCL-UNet for road and building extraction, respectively. Additionally, the presented MCG-UNet and BCL-UNet networks were compared with other state-of-the-art deep learning-based networks, and the results proved the superiority of the networks in multi-object segmentation tasks.

Download Full-text

Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3484440 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-20

Author(s):

Jiaqi Zhao ◽

Yong Zhou ◽

Boyu Shi ◽

Jingsong Yang ◽

Di Zhang ◽

...

Keyword(s):

Remote Sensing ◽

Rapid Development ◽

Remote Sensing Data ◽

Semantic Segmentation ◽

Sensor Technology ◽

Feature Maps ◽

Attention Network ◽

Modal Data ◽

Sensing Data ◽

Multi Stage

With the rapid development of sensor technology, lots of remote sensing data have been collected. It effectively obtains good semantic segmentation performance by extracting feature maps based on multi-modal remote sensing images since extra modal data provides more information. How to make full use of multi-model remote sensing data for semantic segmentation is challenging. Toward this end, we propose a new network called Multi-Stage Fusion and Multi-Source Attention Network ((MS) 2 -Net) for multi-modal remote sensing data segmentation. The multi-stage fusion module fuses complementary information after calibrating the deviation information by filtering the noise from the multi-modal data. Besides, similar feature points are aggregated by the proposed multi-source attention for enhancing the discriminability of features with different modalities. The proposed model is evaluated on publicly available multi-modal remote sensing data sets, and results demonstrate the effectiveness of the proposed method.

Download Full-text

TOWARDS OPEN-SET SEMANTIC SEGMENTATION OF AERIAL IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-3-w2-2020-19-2020 ◽

2020 ◽

Vol IV-3/W2-2020 ◽

pp. 19-24

Author(s):

C. C. V. da Silva ◽

K. Nogueira ◽

H. N. Oliveira ◽

J. A. dos Santos

Keyword(s):

Remote Sensing ◽

Computer Vision ◽

Low Cost ◽

Visible Spectrum ◽

Semantic Segmentation ◽

Aerial Images ◽

Closed Set ◽

Sensing Applications ◽

Open Set ◽

Public Datasets

Abstract. Classical and more recently deep computer vision methods are optimized for visible spectrum images, commonly encoded in grayscale or RGB colorspaces acquired from smartphones or cameras. A more uncommon source of images exploited in the remote sensing field are satellite and aerial images. However the development of pattern recognition approaches for these data is relatively recent, mainly due to the limited availability of this type of images, as until recently they were used exclusively for military purposes. Access to aerial imagery, including spectral information, has been increasing mainly due to the low cost of drones, cheapening of imaging satellite launch costs, and novel public datasets. Usually remote sensing applications employ computer vision techniques strictly modeled for classification tasks in closed set scenarios. However, real-world tasks rarely fit into closed set contexts, frequently presenting previously unknown classes, characterizing them as open set scenarios. Focusing on this problem, this is the first paper to study and develop semantic segmentation techniques for open set scenarios applied to remote sensing images. The main contributions of this paper are: 1) a discussion of related works in open set semantic segmentation, showing evidence that these techniques can be adapted for open set remote sensing tasks; 2) the development and evaluation of a novel approach for open set semantic segmentation. Our method yielded competitive results when compared to closed set methods for the same dataset.

Download Full-text