Detecting High-Rise Buildings from Sentinel-2 Data Based on Deep Learning Method

Liwei Li; Jinming Zhu; Gang Cheng; Bing Zhang

doi:10.3390/rs13204073

Detecting High-Rise Buildings from Sentinel-2 Data Based on Deep Learning Method

Remote Sensing ◽

10.3390/rs13204073 ◽

2021 ◽

Vol 13 (20) ◽

pp. 4073

Author(s):

Liwei Li ◽

Jinming Zhu ◽

Gang Cheng ◽

Bing Zhang

Keyword(s):

Large Scale ◽

3D Structure ◽

Image Features ◽

Convolutional Networks ◽

High Rise ◽

Fully Convolutional Networks ◽

Four Seasons ◽

Training Samples ◽

Definition Of ◽

Sentinel 2

High-rise buildings (HRBs) as a modern and visually distinctive land use play an important role in urbanization. Large-scale monitoring of HRBs is valuable in urban planning and environmental protection and so on. Due to the complex 3D structure and seasonal dynamic image features of HRBs, it is still challenging to monitor large-scale HRBs in a routine way. This paper extends our previous work on the use of the Fully Convolutional Networks (FCN) model to extract HRBs from Sentinel-2 data by studying the influence of seasonal and spatial factors on the performance of the FCN model. 16 Sentinel-2 subset images covering four diverse regions in four seasons were selected for training and validation. Our results indicate the performance of the FCN-based method at the extraction of HRBs from Sentinel-2 data fluctuates among seasons and regions. The seasonal change of accuracy is larger than that of the regional change. If an optimal season can be chosen to get a yearly best result, F1 score of detected HRBs can reach above 0.75 for all regions with most errors located on the boundary of HRBs. FCN model can be trained on seasonally and regionally combined samples to achieve similar or even better overall accuracy than that of the model trained on an optimal combination of season and region. Uncertainties exist on the boundary of detected results and may be relieved by revising the definition of HRBs in a more rigorous way. On the whole, the FCN based method can be largely effective at the extraction of HRBs from Sentinel-2 data in regions with a large diversity in culture, latitude, and landscape. Our results support the possibility to build a powerful FCN model on a larger size of training samples for operational monitoring HRBs at the regional level or even on a country scale.

Download Full-text

BUILDING CLASSIFICATION OF VHR AIRBORNE STEREO IMAGES USING FULLY CONVOLUTIONAL NETWORKS AND FREE TRAINING SAMPLES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-87-2018 ◽

2018 ◽

Vol XLII-4 ◽

pp. 87-92

Author(s):

Y. Chen ◽

W. Gao ◽

E. Widyaningrum ◽

M. Zheng ◽

K. Zhou

Keyword(s):

Large Scale ◽

Semantic Segmentation ◽

Fine Tuning ◽

Stereo Images ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Area Of Interest ◽

Learning Rates ◽

Training Samples ◽

Benchmark Datasets

Abstract. Semantic segmentation, especially for buildings, from the very high resolution (VHR) airborne images is an important task in urban mapping applications. Nowadays, the deep learning has significantly improved and applied in computer vision applications. Fully Convolutional Networks (FCN) is one of the tops voted method due to their good performance and high computational efficiency. However, the state-of-art results of deep nets depend on the training on large-scale benchmark datasets. Unfortunately, the benchmarks of VHR images are limited and have less generalization capability to another area of interest. As existing high precision base maps are easily available and objects are not changed dramatically in an urban area, the map information can be used to label images for training samples. Apart from object changes between maps and images due to time differences, the maps often cannot perfectly match with images. In this study, the main mislabeling sources are considered and addressed by utilizing stereo images, such as relief displacement, different representation between the base map and the image, and occlusion areas in the image. These free training samples are then fed to a pre-trained FCN. To find the better result, we applied fine-tuning with different learning rates and freezing different layers. We further improved the results by introducing atrous convolution. By using free training samples, we achieve a promising building classification with 85.6% overall accuracy and 83.77% F1 score, while the result from ISPRS benchmark by using manual labels has 92.02% overall accuracy and 84.06% F1 score, due to the building complexities in our study area.

Download Full-text

3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study

NeuroImage ◽

10.1016/j.neuroimage.2017.04.039 ◽

2018 ◽

Vol 170 ◽

pp. 456-470 ◽

Cited By ~ 126

Author(s):

Jose Dolz ◽

Christian Desrosiers ◽

Ismail Ben Ayed

Keyword(s):

Large Scale ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Large Scale Study

Download Full-text

SYNERGISTIC USE OF SENTINEL-1 AND SENTINEL-2 TIME SERIES FOR POPLAR PLANTATIONS MONITORING AT LARGE SCALE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-1457-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 1457-1461

Author(s):

Y. Hamrouni ◽

É. Paillassa ◽

V. Chéret ◽

C. Monteil ◽

D. Sheeren

Keyword(s):

Time Series ◽

Active Learning ◽

Supervised Classification ◽

Large Scale ◽

Training Data ◽

Passive Learning ◽

Training Samples ◽

Poplar Plantations ◽

Annual Means ◽

Sentinel 2

Abstract. The current context of availability of Earth Observation satellite data at high spatial and temporal resolutions makes it possible to map large areas. Although supervised classification is the most widely adopted approach, its performance is highly dependent on the availability and the quality of training data. However, gathering samples from field surveys or through photo interpretation is often expensive and time-consuming especially when the area to be classified is large. In this paper we propose the use of an active learning-based technique to address this issue by reducing the labelling effort required for supervised classification while increasing the generalisation capabilities of the classifier across space. Experiments were conducted to identify poplar plantations in three different sites in France using Sentinel-2 time series. In order to characterise the age of the identified poplar stands, temporal means of Sentinel-1 backscatter coefficients were computed. The results are promising and show the good capacities of the active learning-based approach to achieve similar performance (Poplar F-score &geq; 90%) to traditional passive learning (i.e. with random selection of samples) with up to 50% fewer training samples. Sentinel-1 annual means have demonstrated their potential to differentiate two stand ages with an overall accuracy of 83% regardless of the cultivar considered.

Download Full-text

Spatial Relational Attention Using Fully Convolutional Networks for Image Caption Generation

International Journal of Computational Intelligence and Applications ◽

10.1142/s146902682050011x ◽

2020 ◽

Vol 19 (02) ◽

pp. 2050011

Author(s):

Teng Jiang ◽

Liang Gong ◽

Yupu Yang

Keyword(s):

Spatial Relations ◽

Image Features ◽

Attention Mechanism ◽

Convolutional Network ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Benchmark Datasets ◽

Visual Concepts ◽

Image Caption Generation ◽

Image Caption

Attention-based encoder–decoder framework has greatly improved image caption generation tasks. The attention mechanism plays a transitional role by transforming static image features into sequential captions. To generate reasonable captions, it is of great significance to detect spatial characteristics of images. In this paper, we propose a spatial relational attention approach to consider spatial positions and attributes. Image features are firstly weighted by the attention mechanism. Then they are concatenated with contextual features to form a spatial–visual tensor. The tensor is feature extracted by a fully convolutional network to produce visual concepts for the decoder network. The fully convolutional layers maintain spatial topology of images. Experiments conducted on the three benchmark datasets, namely Flickr8k, Flickr30k and MSCOCO, demonstrate the effectiveness of our proposed approach. Captions generated by the spatial relational attention method precisely capture spatial relations of objects.

Download Full-text

Adversarial Reconstruction-Classification Networks for PolSAR Image Classification

Remote Sensing ◽

10.3390/rs11040415 ◽

2019 ◽

Vol 11 (4) ◽

pp. 415 ◽

Cited By ~ 3

Author(s):

Yanqiao Chen ◽

Yangyang Li ◽

Licheng Jiao ◽

Cheng Peng ◽

Xiangrong Zhang ◽

...

Keyword(s):

Image Classification ◽

Training Sample ◽

Prediction Problem ◽

Polarimetric Synthetic Aperture Radar ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Training Samples ◽

Adversarial Training ◽

Composite Representation ◽

Supervised Image Classification

Polarimetric synthetic aperture radar (PolSAR) image classification has become more and more widely used in recent years. It is well known that PolSAR image classification is a dense prediction problem. The recently proposed fully convolutional networks (FCN) model, which is very good at dealing with the dense prediction problem, has great potential in resolving the task of PolSAR image classification. Nevertheless, for FCN, there are some problems to solve in PolSAR image classification. Fortunately, Li et al. proposed the sliding window fully convolutional networks (SFCN) model to tackle the problems of FCN in PolSAR image classification. However, only when the labeled training sample is sufficient, can SFCN achieve good classification results. To address the above mentioned problem, we propose adversarial reconstruction-classification networks (ARCN), which is based on SFCN and introduces reconstruction-classification networks (RCN) and adversarial training. The merit of our method is threefold: (i) A single composite representation that encodes information for supervised image classification and unsupervised image reconstruction can be constructed; (ii) By introducing adversarial training, the higher-order inconsistencies between the true image and reconstructed image can be detected and revised. Our method can achieve impressive performance in PolSAR image classification with fewer labeled training samples. We have validated its performance by comparing it against several state-of-the-art methods. Experimental results obtained by classifying three PolSAR images demonstrate the efficiency of the proposed method.

Download Full-text

Extracting Building Areas from Photogrammetric DSM and DOM by Automatically Selecting Training Samples from Historical DLG Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9010018 ◽

2020 ◽

Vol 9 (1) ◽

pp. 18 ◽

Cited By ~ 2

Author(s):

Siyang Chen ◽

Yunsheng Zhang ◽

Ke Nie ◽

Xiaoming Li ◽

Weixi Wang

Keyword(s):

Principal Component ◽

Surface Model ◽

Data Sets ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Object Based ◽

Training Samples ◽

Acceptable Quality ◽

Digital Orthophoto ◽

Pca Algorithm

This paper presents an automatic building extraction method which utilizes a photogrammetric digital surface model (DSM) and digital orthophoto map (DOM) with the help of historical digital line graphic (DLG) data. To reduce the need for manual labeling, the initial labels were automatically obtained from historical DLGs. Nonetheless, a proportion of these labels are incorrect due to changes (e.g., new constructions, demolished buildings). To select clean samples, an iterative method using random forest (RF) classifier was proposed in order to remove some possible incorrect labels. To get effective features, deep features extracted from normalized DSM (nDSM) and DOM using the pre-trained fully convolutional networks (FCN) were combined. To control the computation cost and alleviate the burden of redundancy, the principal component analysis (PCA) algorithm was applied to reduce the feature dimensions. Three data sets in two areas were employed with evaluation in two aspects. In these data sets, three DLGs with 15%, 65%, and 25% of noise were applied. The results demonstrate the proposed method could effectively select clean samples, and maintain acceptable quality of extracted results in both pixel-based and object-based evaluations.

Download Full-text

Deforestation Detection with Fully Convolutional Networks in the Amazon Forest from Landsat-8 and Sentinel-2 Images

Remote Sensing ◽

10.3390/rs13245084 ◽

2021 ◽

Vol 13 (24) ◽

pp. 5084

Author(s):

Daliana Lobo Torres ◽

Javier Noa Turnes ◽

Pedro Juan Soto Vega ◽

Raul Queiroz Feitosa ◽

Daniel E. Silva ◽

...

Keyword(s):

Change Detection ◽

Forest Monitoring ◽

Landsat 8 ◽

Amazon Forest ◽

Forest Change ◽

Convolutional Network ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Detection Analysis ◽

Sentinel 2

The availability of remote-sensing multisource data from optical-based satellite sensors has created new opportunities and challenges for forest monitoring in the Amazon Biome. In particular, change-detection analysis has emerged in recent decades to monitor forest-change dynamics, supporting some Brazilian governmental initiatives such as PRODES and DETER projects for biodiversity preservation in threatened areas. In recent years fully convolutional network architectures have witnessed numerous proposals adapted for the change-detection task. This paper comprehensively explores state-of-the-art fully convolutional networks such as U-Net, ResU-Net, SegNet, FC-DenseNet, and two DeepLabv3+ variants on monitoring deforestation in the Brazilian Amazon. The networks’ performance is evaluated experimentally in terms of Precision, Recall, F1-score, and computational load using satellite images with different spatial and spectral resolution: Landsat-8 and Sentinel-2. We also include the results of an unprecedented auditing process performed by senior specialists to visually evaluate each deforestation polygon derived from the network with the highest accuracy results for both satellites. This assessment allowed estimation of the accuracy of these networks simulating a process “in nature” and faithful to the PRODES methodology. We conclude that the high resolution of Sentinel-2 images improves the segmentation of deforestation polygons both quantitatively (in terms of F1-score) and qualitatively. Moreover, the study also points to the potential of the operational use of Deep Learning (DL) mapping as products to be consumed in PRODES.

Download Full-text

Flood Detection in Gaofen-3 SAR Images via Fully Convolutional Networks

Sensors ◽

10.3390/s18092915 ◽

2018 ◽

Vol 18 (9) ◽

pp. 2915 ◽

Cited By ~ 11

Author(s):

Wenchao Kang ◽

Yuming Xiang ◽

Feng Wang ◽

Ling Wan ◽

Hongjian You

Keyword(s):

Detection Method ◽

State Of The Art ◽

Sar Images ◽

Convolutional Network ◽

Training Time ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Training Samples ◽

Flood Detection ◽

Fine Tune

Emergency flood monitoring and rescue need to first detect flood areas. This paper provides a fast and novel flood detection method and applies it to Gaofen-3 SAR images. The fully convolutional network (FCN), a variant of VGG16, is utilized for flood mapping in this paper. Considering the requirement of flood detection, we fine-tune the model to get higher accuracy results with shorter training time and fewer training samples. Compared with state-of-the-art methods, our proposed algorithm not only gives robust and accurate detection results but also significantly reduces the detection time.

Download Full-text

Delineation of Agricultural Field Boundaries from Sentinel-2 Images Using a Novel Super-Resolution Contour Detector Based on Fully Convolutional Networks

Remote Sensing ◽

10.3390/rs12010059 ◽

2019 ◽

Vol 12 (1) ◽

pp. 59 ◽

Cited By ~ 2

Author(s):

Khairiya Mudrik Masoud ◽

Claudio Persello ◽

Valentyn A. Tolpekin

Keyword(s):

Contextual Information ◽

Super Resolution ◽

Contour Detection ◽

Agricultural Field ◽

Convolutional Network ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Learning Technique ◽

Field Boundaries ◽

Sentinel 2

Boundaries of agricultural fields are important features necessary for defining the location, shape, and spatial extent of agricultural units. They are commonly used to summarize production statistics at the field level. In this study, we investigate the delineation of agricultural field boundaries (AFB) from Sentinel-2 satellite images acquired over the Flevoland province, the Netherlands, using a deep learning technique based on fully convolutional networks (FCNs). We designed a multiple dilation fully convolutional network (MD-FCN) for AFB detection from Sentinel-2 images at 10 m resolution. Furthermore, we developed a novel super-resolution semantic contour detection network (named SRC-Net) using a transposed convolutional layer in the FCN architecture to enhance the spatial resolution of the AFB output from 10 m to 5 m resolution. The SRC-Net also improves the AFB maps at 5 m resolution by exploiting the spatial-contextual information in the label space. The results of the proposed SRC-Net outperform alternative upsampling techniques and are only slightly inferior to the results of the MD-FCN for AFB detection from RapidEye images acquired at 5 m resolution.

Download Full-text

Graph Convolutional Networks by Architecture Search for PolSAR Image Classification

Remote Sensing ◽

10.3390/rs13071404 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1404

Author(s):

Hongying Liu ◽

Derong Xu ◽

Tianwen Zhu ◽

Fanhua Shang ◽

Yuanyuan Liu ◽

...

Keyword(s):

Neural Networks ◽

Large Scale ◽

Spatial Relations ◽

Classification Problem ◽

Search Method ◽

Sample Distribution ◽

Convolutional Networks ◽

Training Samples ◽

Graph Neural Networks

Classification of polarimetric synthetic aperture radar (PolSAR) images has achieved good results due to the excellent fitting ability of neural networks with a large number of training samples. However, the performance of most convolutional neural networks (CNNs) degrades dramatically when only a few labeled training samples are available. As one well-known class of semi-supervised learning methods, graph convolutional networks (GCNs) have gained much attention recently to address the classification problem with only a few labeled samples. As the number of layers grows in the network, the parameters dramatically increase. It is challenging to determine an optimal architecture manually. In this paper, we propose a neural architecture search method based GCN (ASGCN) for the classification of PolSAR images. We construct a novel graph whose nodes combines both the physical features and spatial relations between pixels or samples to represent the image. Then we build a new searching space whose components are empirically selected from some graph neural networks for architecture search and develop the differentiable architecture search method to construction our ASGCN. Moreover, to address the training of large-scale images, we present a new weighted mini-batch algorithm to reduce the computing memory consumption and ensure the balance of sample distribution, and also analyze and compare with other similar training strategies. Experiments on several real-world PolSAR datasets show that our method has improved the overall accuracy as much as 3.76% than state-of-the-art methods.

Download Full-text