EANet: Edge-Aware Network for the Extraction of Buildings from Aerial Images

Guang Yang; Qian Zhang; Guixu Zhang

doi:10.3390/rs12132161

EANet: Edge-Aware Network for the Extraction of Buildings from Aerial Images

Remote Sensing ◽

10.3390/rs12132161 ◽

2020 ◽

Vol 12 (13) ◽

pp. 2161 ◽

Cited By ~ 3

Author(s):

Guang Yang ◽

Qian Zhang ◽

Guixu Zhang

Keyword(s):

Remote Sensing ◽

Image Segmentation ◽

Data Augmentation ◽

State Of The Art ◽

Receptive Fields ◽

Aerial Images ◽

Remote Sensing Images ◽

Global Features ◽

Multi Scale

Deep learning methods have been used to extract buildings from remote sensing images and have achieved state-of-the-art performance. Most previous work has emphasized the multi-scale fusion of features or the enhancement of more receptive fields to achieve global features rather than focusing on low-level details such as the edges. In this work, we propose a novel end-to-end edge-aware network, the EANet, and an edge-aware loss for getting accurate buildings from aerial images. Specifically, the architecture is composed of image segmentation networks and edge perception networks that, respectively, take charge of building prediction and edge investigation. The International Society for Photogrammetry and Remote Sensing (ISPRS) Potsdam segmentation benchmark and the Wuhan University (WHU) building benchmark were used to evaluate our approach, which, respectively, was found to achieve 90.19% and 93.33% intersection-over-union and top performance without using additional datasets, data augmentation, and post-processing. The EANet is effective in extracting buildings from aerial images, which shows that the quality of image segmentation can be improved by focusing on edge details.

Download Full-text

MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer

Remote Sensing ◽

10.3390/rs13234743 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4743

Author(s):

Wei Yuan ◽

Wenbo Xu

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Network Model ◽

Remote Sensing Images ◽

Feature Maps ◽

Global Features ◽

Adaptive Network ◽

Data Set ◽

Multi Scale

The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.

Download Full-text

Multi-scale Adaptive Feature Fusion Network for Semantic Segmentation in Remote Sensing Images

Remote Sensing ◽

10.3390/rs12050872 ◽

2020 ◽

Vol 12 (5) ◽

pp. 872 ◽

Cited By ~ 3

Author(s):

Ronghua Shang ◽

Jiyu Zhang ◽

Licheng Jiao ◽

Yangyang Li ◽

Naresh Marturi ◽

...

Keyword(s):

Remote Sensing ◽

Multiple Scales ◽

Feature Fusion ◽

Semantic Segmentation ◽

Semantic Features ◽

Remote Sensing Images ◽

Global Features ◽

Global Average ◽

Multi Scale ◽

Context Extraction

Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high- and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.

Download Full-text

COMBINED PATCH-WISE MINIMAL-MAXIMAL PIXELS REGULARIZATION FOR DEBLURRING

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-1-2020-17-2020 ◽

2020 ◽

Vol V-1-2020 ◽

pp. 17-23

Author(s):

J. Han ◽

S. L. Zhang ◽

Z. Ye

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Remote Sensing Images ◽

Latent Image ◽

Splitting Algorithm ◽

Blind Deblurring ◽

Processing Procedure ◽

Ill Posed ◽

Maximum A Posterior

Abstract. Deblurring is a vital image pre-processing procedure to improve the quality of images. It is a classical ill-posed problem. A new blind deblurring method based on image sparsity prior is proposed here. The proposed image sparsity prior combines patch-wise minimal and maximal pixels of latent image, and improves gradually the image sparsity during deblurring. An algorithm that is different with half quadratics splitting algorithm is applied under the maximum a posterior (MAP) framework. Experiment results demonstrate that the proposed method can keep more subtle texture and sharpened edges, reduce the artefacts in visual, and the corresponding evaluated indexes perform favourably against it of the state-of-the-art methods on synthesized, natural and remote sensing images (RSI) quantitatively.

Download Full-text

MRFF-YOLO: A Multi-Receptive Fields Fusion Network for Remote Sensing Target Detection

Remote Sensing ◽

10.3390/rs12193118 ◽

2020 ◽

Vol 12 (19) ◽

pp. 3118

Author(s):

Danqing Xu ◽

Yiquan Wu

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Target Detection ◽

State Of The Art ◽

Receptive Fields ◽

Aerial Images ◽

Detection Rates ◽

Detection Algorithms ◽

Proposed Model ◽

Small Targets

High-altitude remote sensing target detection has problems related to its low precision and low detection rate. In order to enhance the performance of detecting remote sensing targets, a new YOLO (You Only Look Once)-V3-based algorithm was proposed. In our improved YOLO-V3, we introduced the concept of multi-receptive fields to enhance the performance of feature extraction. Therefore, the proposed model was termed Multi-Receptive Fields Fusion YOLO (MRFF-YOLO). In addition, to address the flaws of YOLO-V3 in detecting small targets, we increased the detection layers from three to four. Moreover, in order to avoid gradient fading, the structure of improved DenseNet was chosen in the detection layers. We compared our approach (MRFF-YOLO) with YOLO-V3 and other state-of-the-art target detection algorithms on an Remote Sensing Object Detection (RSOD) dataset and a dataset of Object Detection in Aerial Images (UCS-AOD). With a series of improvements, the mAP (mean average precision) of MRFF-YOLO increased from 77.10% to 88.33% in the RSOD dataset and increased from 75.67% to 90.76% in the UCS-AOD dataset. The leaking detection rates are also greatly reduced, especially for small targets. The experimental results showed that our approach achieved better performance than traditional YOLO-V3 and other state-of-the-art models for remote sensing target detection.

Download Full-text

Spatial–Spectral Feature Fusion Coupled with Multi-Scale Segmentation Voting Decision for Detecting Land Cover Change with VHR Remote Sensing Images

Remote Sensing ◽

10.3390/rs11161903 ◽

2019 ◽

Vol 11 (16) ◽

pp. 1903 ◽

Cited By ~ 6

Author(s):

Zheng ◽

Cao ◽

Lv ◽

Benediktsson

Keyword(s):

Remote Sensing ◽

Change Detection ◽

Feature Fusion ◽

State Of The Art ◽

Spectral Feature ◽

Spectral Features ◽

Post Processing ◽

Remote Sensing Images ◽

Multi Scale ◽

And Performance

In this article, a novel approach for land cover change detection (LCCD) using very high resolution (VHR) remote sensing images based on spatial–spectral feature fusion and multi-scale segmentation voting decision is proposed. Unlike other traditional methods that have used a single feature without post-processing on a raw detection map, the proposed approach uses spatial–spectral features and post-processing strategies to improve detecting accuracies and performance. Our proposed approach involved two stages. First, we explored the spatial features of the VHR remote sensing image to complement the insufficiency of the spectral feature, and then fused the spatial–spectral features with different strategies. Next, the Manhattan distance between the corresponding spatial–spectral feature vectors of the bi-temporal images was employed to measure the change magnitude between the bi-temporal images and generate a change magnitude image (CMI). Second, the use of the Otsu binary threshold algorithm was proposed to divide the CMI into a binary change detection map (BCDM) and a multi-scale segmentation voting decision algorithm to fuse the initial BCDMs as the final change detection map was proposed. Experiments were carried out on three pairs of bi-temporal remote sensing images with VHR remote sensing images. The results were compared with those of the state-of-the-art methods including four popular contextual-based LCCD methods and three post-processing LCCD methods. Experimental comparisons demonstrated that the proposed approach had an advantage over other state-of-the-art techniques in terms of detection accuracies and performance.

Download Full-text

A Novel Multi-Scale Attention PFE-UNet for Forest Image Segmentation

Forests ◽

10.3390/f12070937 ◽

2021 ◽

Vol 12 (7) ◽

pp. 937

Author(s):

Boyang Zhang ◽

Hongbo Mu ◽

Mingyu Gao ◽

Haiming Ni ◽

Jianfeng Chen ◽

...

Keyword(s):

Feature Extraction ◽

Image Segmentation ◽

State Of The Art ◽

Computational Cost ◽

Receptive Fields ◽

Feature Maps ◽

Multi Scale ◽

Extensive Evaluation ◽

Network Transition ◽

Segmentation Task

The precise segmentation of forest areas is essential for monitoring tasks related to forest exploration, extraction, and statistics. However, the effective and accurate segmentation of forest images will be affected by factors such as blurring and discontinuity of forest boundaries. Therefore, a Pyramid Feature Extraction-UNet network (PFE-UNet) based on traditional UNet is proposed to be applied to end-to-end forest image segmentation. Among them, the Pyramid Feature Extraction module (PFE) is introduced in the network transition layer, which obtains multi-scale forest image information through different receptive fields. The spatial attention module (SA) and the channel-wise attention module (CA) are applied to low-level feature maps and PFE feature maps, respectively, to highlight specific segmentation task features while fusing context information and suppressing irrelevant regions. The standard convolution block is replaced by a novel depthwise separable convolutional unit (DSC Unit), which not only reduces the computational cost but also prevents overfitting. This paper presents an extensive evaluation with the DeepGlobe dataset and a comparative analysis with several state-of-the-art networks. The experimental results show that the PFE-UNet network obtains an accuracy of 94.23% in handling the real-time forest image segmentation, which is significantly higher than other advanced networks. This means that the proposed PFE-UNet also provides a valuable reference for the precise segmentation of forest images.

Download Full-text

Remote Sensing Image Augmentation Based on Text Description for Waterside Change Detection

Remote Sensing ◽

10.3390/rs13101894 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1894

Author(s):

Chen Chen ◽

Hongxiang Ma ◽

Guorun Yao ◽

Ning Lv ◽

Hua Yang ◽

...

Keyword(s):

Remote Sensing ◽

Image Segmentation ◽

Change Detection ◽

Data Augmentation ◽

Remote Sensing Images ◽

Generative Adversarial Network ◽

Administrative Procedure ◽

Adversarial Network ◽

Training Samples ◽

Perlin Noise

Since remote sensing images are difficult to obtain and need to go through a complicated administrative procedure for use in China, it cannot meet the requirement of huge training samples for Waterside Change Detection based on deep learning. Recently, data augmentation has become an effective method to address the issue of an absence of training samples. Therefore, an improved Generative Adversarial Network (GAN), i.e., BTD-sGAN (Text-based Deeply-supervised GAN), is proposed to generate training samples for remote sensing images of Anhui Province, China. The principal structure of our model is based on Deeply-supervised GAN(D-sGAN), and D-sGAN is improved from the point of the diversity of the generated samples. First, the network takes Perlin Noise, image segmentation graph, and encoded text vector as input, in which the size of image segmentation graph is adjusted to 128 × 128 to facilitate fusion with the text vector. Then, to improve the diversity of the generated images, the text vector is used to modify the semantic loss of the downsampled text. Finally, to balance the time and quality of image generation, only a two-layer Unet++ structure is used to generate the image. Herein, “Inception Score”, “Human Rank”, and “Inference Time” are used to evaluate the performance of BTD-sGAN, StackGAN++, and GAN-INT-CLS. At the same time, to verify the diversity of the remote sensing images generated by BTD-sGAN, this paper compares the results when the generated images are sent to the remote sensing interpretation network and when the generated images are not added; the results show that the generated image can improve the precision of soil-moving detection by 5%, which proves the effectiveness of the proposed model.

Download Full-text

C-UNet: Complement UNet for Remote Sensing Road Extraction

Sensors ◽

10.3390/s21062153 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2153

Author(s):

Yuewu Hou ◽

Zhaoying Liu ◽

Ting Zhang ◽

Yujian Li

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Segmentation Result ◽

Road Extraction ◽

Remote Sensing Images ◽

Daily Work ◽

Multi Scale ◽

Dilated Convolution ◽

Mode Of Transportation ◽

Work And Life

Roads are important mode of transportation, which are very convenient for people’s daily work and life. However, it is challenging to accuratly extract road information from a high-resolution remote sensing image. This paper presents a road extraction method for remote sensing images with a complement UNet (C-UNet). C-UNet contains four modules. Firstly, the standard UNet is used to roughly extract road information from remote sensing images, getting the first segmentation result; secondly, a fixed threshold is utilized to erase partial extracted information; thirdly, a multi-scale dense dilated convolution UNet (MD-UNet) is introduced to discover the complement road areas in the erased masks, obtaining the second segmentation result; and, finally, we fuse the extraction results of the first and the third modules, getting the final segmentation results. Experimental results on the Massachusetts Road dataset indicate that our C-UNet gets the higher results than the state-of-the-art methods, demonstrating its effectiveness.

Download Full-text

Enhanced Feature Representation in Detection for Optical Remote Sensing Images

Remote Sensing ◽

10.3390/rs11182095 ◽

2019 ◽

Vol 11 (18) ◽

pp. 2095 ◽

Cited By ~ 4

Author(s):

Kun Fu ◽

Zhuo Chen ◽

Yue Zhang ◽

Xian Sun

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Computational Cost ◽

Feature Representation ◽

Detection Accuracy ◽

Optical Remote Sensing ◽

Remote Sensing Images ◽

Two Stage ◽

Multi Scale ◽

One Stage

In recent years, deep learning has led to a remarkable breakthrough in object detection in remote sensing images. In practice, two-stage detectors perform well regarding detection accuracy but are slow. On the other hand, one-stage detectors integrate the detection pipeline of two-stage detectors to simplify the detection process, and are faster, but with lower detection accuracy. Enhancing the capability of feature representation may be a way to improve the detection accuracy of one-stage detectors. For this goal, this paper proposes a novel one-stage detector with enhanced capability of feature representation. The enhanced capability benefits from two proposed structures: dual top-down module and dense-connected inception module. The former efficiently utilizes multi-scale features from multiple layers of the backbone network. The latter both widens and deepens the network to enhance the ability of feature representation with limited extra computational cost. To evaluate the effectiveness of proposed structures, we conducted experiments on horizontal bounding box detection tasks on the challenging DOTA dataset and gained 73.49% mean Average Precision (mAP), achieving state-of-the-art performance. Furthermore, our method ran significantly faster than the best public two-stage detector on the DOTA dataset.

Download Full-text

State-of-the-Art: A Systematic Literature Review on Image Segmentation in Latent Fingerprint Forensics

Recent Patents on Computer Science ◽

10.2174/2213275912666190429153952 ◽

2019 ◽

Vol 12 ◽

Cited By ~ 2

Author(s):

Megha Chhabra ◽

Manoj Kumar Shukla ◽

Kiran Kumar Ravulakollu

Keyword(s):

Image Segmentation ◽

Background Noise ◽

State Of The Art ◽

Recognition Rate ◽

Poor Quality ◽

Latent Fingerprint ◽

In The Beginning ◽

The Comparative Study ◽

Finger Skin

: Latent fingerprints are unintentional finger skin impressions left as ridge patterns at crime scenes. A major challenge in latent fingerprint forensics is the poor quality of the lifted image from the crime scene. Forensics investigators are in permanent search of novel outbreaks of the effective technologies to capture and process low quality image. The accuracy of the results depends upon the quality of the image captured in the beginning, metrics used to assess the quality and thereafter level of enhancement required. The low quality of the image collected by low quality scanners, unstructured background noise, poor ridge quality, overlapping structured noise result in detection of false minutiae and hence reduce the recognition rate. Traditionally, Image segmentation and enhancement is partially done manually using help of highly skilled experts. Using automated systems for this work, differently challenging quality of images can be investigated faster. This survey amplifies the comparative study of various segmentation techniques available for latent fingerprint forensics.

Download Full-text