scholarly journals An Efficient Building Extraction Method from High Spatial Resolution Remote Sensing Images Based on Improved Mask R-CNN

Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1465 ◽  
Author(s):  
Lili Zhang ◽  
Jisen Wu ◽  
Yu Fan ◽  
Hongmin Gao ◽  
Yehong Shao

In this paper, we consider building extraction from high spatial resolution remote sensing images. At present, most building extraction methods are based on artificial features. However, the diversity and complexity of buildings mean that building extraction methods still face great challenges, so methods based on deep learning have recently been proposed. In this paper, a building extraction framework based on a convolution neural network and edge detection algorithm is proposed. The method is called Mask R-CNN Fusion Sobel. Because of the outstanding achievement of Mask R-CNN in the field of image segmentation, this paper improves it and then applies it in remote sensing image building extraction. Our method consists of three parts. First, the convolutional neural network is used for rough location and pixel level classification, and the problem of false and missed extraction is solved by automatically discovering semantic features. Second, Sobel edge detection algorithm is used to segment building edges accurately so as to solve the problem of edge extraction and the integrity of the object of deep convolutional neural networks in semantic segmentation. Third, buildings are extracted by the fusion algorithm. We utilize the proposed framework to extract the building in high-resolution remote sensing images from Chinese satellite GF-2, and the experiments show that the average value of IOU (intersection over union) of the proposed method was 88.7% and the average value of Kappa was 87.8%, respectively. Therefore, our method can be applied to the recognition and segmentation of complex buildings and is superior to the classical method in accuracy.

Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7241
Author(s):  
Dengji Zhou ◽  
Guizhou Wang ◽  
Guojin He ◽  
Tengfei Long ◽  
Ranyu Yin ◽  
...  

Building extraction from high spatial resolution remote sensing images is a hot spot in the field of remote sensing applications and computer vision. This paper presents a semantic segmentation model, which is a supervised method, named Pyramid Self-Attention Network (PISANet). Its structure is simple, because it contains only two parts: one is the backbone of the network, which is used to learn the local features (short distance context information around the pixel) of buildings from the image; the other part is the pyramid self-attention module, which is used to obtain the global features (long distance context information with other pixels in the image) and the comprehensive features (includes color, texture, geometric and high-level semantic feature) of the building. The network is an end-to-end approach. In the training stage, the input is the remote sensing image and corresponding label, and the output is probability map (the probability that each pixel is or is not building). In the prediction stage, the input is the remote sensing image, and the output is the extraction result of the building. The complexity of the network structure was reduced so that it is easy to implement. The proposed PISANet was tested on two datasets. The result shows that the overall accuracy reached 94.50 and 96.15%, the intersection-over-union reached 77.45 and 87.97%, and F1 index reached 87.27 and 93.55%, respectively. In experiments on different datasets, PISANet obtained high overall accuracy, low error rate and improved integrity of individual buildings.


2020 ◽  
Vol 12 (23) ◽  
pp. 3983
Author(s):  
Qiqi Zhu ◽  
Zhen Li ◽  
Yanan Zhang ◽  
Qingfeng Guan

Building extraction is a binary classification task that separates the building area from the background in remote sensing images. The conditional random field (CRF) is directly modelled by the maximum posterior probability, which can make full use of the spatial neighbourhood information of both labelled and observed images. CRF is widely used in building footprint extraction. However, edge oversmoothing still exists when CRF is directly used to extract buildings from high spatial resolution (HSR) remote sensing images. Based on a computer vision multi-scale semantic segmentation network (D-LinkNet), a novel building extraction framework is proposed, named multiscale-aware and segmentation-prior conditional random fields (MSCRF). To solve the problem of losing building details in the downsampling process, D-LinkNet connecting the encoder and decoder is correspondingly used to generate the unary potential. By integrating multi-scale building features in the central module, D-LinkNet can integrate multiscale contextual information without loss of resolution. For the pairwise potential, the segmentation prior is fused to alleviate the influence of spectral diversity between the building and the background area. Moreover, the local class label cost term is introduced. The clear boundaries of the buildings are obtained by using the larger-scale context information. The experimental results demonstrate that the proposed MSCRF framework is superior to the state-of-the-art methods and performs well for building extraction of complex scenes.


2021 ◽  
Vol 13 (13) ◽  
pp. 2524
Author(s):  
Ziyi Chen ◽  
Dilong Li ◽  
Wentao Fan ◽  
Haiyan Guan ◽  
Cheng Wang ◽  
...  

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.


2015 ◽  
Vol 109 ◽  
pp. 108-125 ◽  
Author(s):  
Xinghua Li ◽  
Nian Hui ◽  
Huanfeng Shen ◽  
Yunjie Fu ◽  
Liangpei Zhang

2018 ◽  
Vol 10 (11) ◽  
pp. 1737 ◽  
Author(s):  
Jinchao Song ◽  
Tao Lin ◽  
Xinhu Li ◽  
Alexander V. Prishchepov

Fine-scale, accurate intra-urban functional zones (urban land use) are important for applications that rely on exploring urban dynamic and complexity. However, current methods of mapping functional zones in built-up areas with high spatial resolution remote sensing images are incomplete due to a lack of social attributes. To address this issue, this paper explores a novel approach to mapping urban functional zones by integrating points of interest (POIs) with social properties and very high spatial resolution remote sensing imagery with natural attributes, and classifying urban function as residence zones, transportation zones, convenience shops, shopping centers, factory zones, companies, and public service zones. First, non-built and built-up areas were classified using high spatial resolution remote sensing images. Second, the built-up areas were segmented using an object-based approach by utilizing building rooftop characteristics (reflectance and shapes). At the same time, the functional POIs of the segments were identified to determine the functional attributes of the segmented polygon. Third, the functional values—the mean priority of the functions in a road-based parcel—were calculated by functional segments and segmental weight coefficients. This method was demonstrated on Xiamen Island, China with an overall accuracy of 78.47% and with a kappa coefficient of 74.52%. The proposed approach could be easily applied in other parts of the world where social data and high spatial resolution imagery are available and improve accuracy when automatically mapping urban functional zones using remote sensing imagery. It will also potentially provide large-scale land-use information.


Sign in / Sign up

Export Citation Format

Share Document