scholarly journals Building Extraction from Very High Resolution Aerial Imagery Using Joint Attention Deep Neural Network

2019 ◽  
Vol 11 (24) ◽  
pp. 2970 ◽  
Author(s):  
Ziran Ye ◽  
Yongyong Fu ◽  
Muye Gan ◽  
Jinsong Deng ◽  
Alexis Comber ◽  
...  

Automated methods to extract buildings from very high resolution (VHR) remote sensing data have many applications in a wide range of fields. Many convolutional neural network (CNN) based methods have been proposed and have achieved significant advances in the building extraction task. In order to refine predictions, a lot of recent approaches fuse features from earlier layers of CNNs to introduce abundant spatial information, which is known as skip connection. However, this strategy of reusing earlier features directly without processing could reduce the performance of the network. To address this problem, we propose a novel fully convolutional network (FCN) that adopts attention based re-weighting to extract buildings from aerial imagery. Specifically, we consider the semantic gap between features from different stages and leverage the attention mechanism to bridge the gap prior to the fusion of features. The inferred attention weights along spatial and channel-wise dimensions make the low level feature maps adaptive to high level feature maps in a target-oriented manner. Experimental results on three publicly available aerial imagery datasets show that the proposed model (RFA-UNet) achieves comparable and improved performance compared to other state-of-the-art models for building extraction.

2018 ◽  
Vol 10 (11) ◽  
pp. 1768 ◽  
Author(s):  
Hui Yang ◽  
Penghai Wu ◽  
Xuedong Yao ◽  
Yanlan Wu ◽  
Biao Wang ◽  
...  

Building extraction from very high resolution (VHR) imagery plays an important role in urban planning, disaster management, navigation, updating geographic databases, and several other geospatial applications. Compared with the traditional building extraction approaches, deep learning networks have recently shown outstanding performance in this task by using both high-level and low-level feature maps. However, it is difficult to utilize different level features rationally with the present deep learning networks. To tackle this problem, a novel network based on DenseNets and the attention mechanism was proposed, called the dense-attention network (DAN). The DAN contains an encoder part and a decoder part which are separately composed of lightweight DenseNets and a spatial attention fusion module. The proposed encoder–decoder architecture can strengthen feature propagation and effectively bring higher-level feature information to suppress the low-level feature and noises. Experimental results based on public international society for photogrammetry and remote sensing (ISPRS) datasets with only red–green–blue (RGB) images demonstrated that the proposed DAN achieved a higher score (96.16% overall accuracy (OA), 92.56% F1 score, 90.56% mean intersection over union (MIOU), less training and response time and higher-quality value) when compared with other deep learning methods.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3341 ◽  
Author(s):  
Hilal Tayara ◽  
Kil Chong

Object detection in very high-resolution (VHR) aerial images is an essential step for a wide range of applications such as military applications, urban planning, and environmental management. Still, it is a challenging task due to the different scales and appearances of the objects. On the other hand, object detection task in VHR aerial images has improved remarkably in recent years due to the achieved advances in convolution neural networks (CNN). Most of the proposed methods depend on a two-stage approach, namely: a region proposal stage and a classification stage such as Faster R-CNN. Even though two-stage approaches outperform the traditional methods, their optimization is not easy and they are not suitable for real-time applications. In this paper, a uniform one-stage model for object detection in VHR aerial images has been proposed. In order to tackle the challenge of different scales, a densely connected feature pyramid network has been proposed by which high-level multi-scale semantic feature maps with high-quality information are prepared for object detection. This work has been evaluated on two publicly available datasets and outperformed the current state-of-the-art results on both in terms of mean average precision (mAP) and computation time.


Author(s):  
H. R. Hosseinpoor ◽  
F. Samadzadegan

Abstract. Buildings are a major element in the formation of cities and are essential for urban mapping. The precise extraction of buildings from remote sensing data has become a significant topic and has received much attention in recent years. The recently developed convolutional neural networks have shown effective and superior performance to perform well on learning high-level and discriminative features in extracting buildings because of the outstanding feature learning and end-to-end pixel labelling abilities. However, it is difficult to use the features of different levels with a certain degree of importance that is appropriate to deep learning networks. To tackle this problem, a network based on U-Nets and the attention mechanism block was proposed. The network contains an encoder part and a decoder part and a spatial attention module. The special architecture presented in this article enhances the propagation of features and effectively utilizes the features at various levels to reduce errors. The other remarkable thing is that attention module blocks only lead to a minimal increase in model complexity. We effectively demonstrate an improvement of building extraction accuracy on challenging Potsdam and Vaihingen benchmark datasets. The results of this paper show that the proposed architecture improves building extraction in very high resolution remote sensing images compared to previous models.


2019 ◽  
Vol 11 (20) ◽  
pp. 2380 ◽  
Author(s):  
Liu ◽  
Luo ◽  
Huang ◽  
Hu ◽  
Sun ◽  
...  

Deep convolutional neural networks have promoted significant progress in building extraction from high-resolution remote sensing imagery. Although most of such work focuses on modifying existing image segmentation networks in computer vision, we propose a new network in this paper, Deep Encoding Network (DE-Net), that is designed for the very problem based on many lately introduced techniques in image segmentation. Four modules are used to construct DE-Net: the inceptionstyle downsampling modules combining a striding convolution layer and a max-pooling layer, the encoding modules comprising six linear residual blocks with a scaled exponential linear unit (SELU) activation function, the compressing modules reducing the feature channels, and a densely upsampling module that enables the network to encode spatial information inside feature maps. Thus, DE-Net achieves stateoftheart performance on the WHU Building Dataset in recall, F1-Score, and intersection over union (IoU) metrics without pretraining. It also outperformed several segmentation networks in our self-built Suzhou Satellite Building Dataset. The experimental results validate the effectiveness of DE-Net on building extraction from aerial imagery and satellite imagery. It also suggests that given enough training data, designing and training a network from scratch may excel fine-tuning models pre-trained on datasets unrelated to building extraction.


2021 ◽  
Vol 13 (4) ◽  
pp. 611
Author(s):  
Damian Wierzbicki ◽  
Olga Matuk ◽  
Elzbieta Bielecka

Automatic building extraction from remote sensing data is a hot but challenging research topic for cadastre verification, modernization and updating. Deep learning algorithms are perceived as more promising in overcoming the difficulties of extracting semantic features from complex scenes and large differences in buildings’ appearance. This paper explores the modified fully convolutional network U-Shape Network (U-Net) for high resolution aerial orthoimagery segmentation and dense LiDAR data to extract building outlines automatically. The three-step end-to-end computational procedure allows for automated building extraction with an 89.5% overall accuracy and an 80.7% completeness, which made it very promising for cadastre modernization in Poland. The applied algorithms work well both in densely and poorly built-up areas, typical for peripheral areas of cities, where uncontrolled development had recently been observed. Discussing the possibilities and limitations, the authors also provide some important information that could help local authorities decide on the use of remote sensing data in land administration.


2021 ◽  
Vol 10 (3) ◽  
pp. 147
Author(s):  
Tamara Alshaikhli ◽  
Wen Liu ◽  
Yoshihisa Maruyama

The extraction of roads and centerlines from aerial imagery is considered an important topic because it contributes to different fields, such as urban planning, transportation engineering, and disaster mitigation. Many researchers have studied this topic as a two-separated task that affects the quality of extracted roads and centerlines because of the correlation between these two tasks. Accurate road extraction enhances accurate centerline extraction if these two tasks are processed simultaneously. This study proposes a multitask learning scheme using a gated deep convolutional neural network (DCNN) to extract roads and centerlines simultaneously. The DCNN is composed of one encoder and two decoders implemented on the U-Net backbone. The decoders are assigned to extract roads and centerlines from low-resolution feature maps. Before extraction, the images are processed within an encoder to extract the spatial information from a complex, high-resolution image. The encoder consists of the residual blocks (Res-Block) connected to a bridge represented by a Res-Block, and the bridge connects the two identical decoders, which consists of stacking convolutional layers (Conv.layer). Attention gates (AGs) are added to our model to enhance the selection process for the true pixels that represent road or centerline classes. Our model is trained on a dataset of high-resolution aerial images, which is open to the public. The model succeeds in efficiently extracting roads and centerlines compared with other multitask learning models.


2021 ◽  
Vol 13 (13) ◽  
pp. 2473
Author(s):  
Qinglie Yuan ◽  
Helmi Zulhaidi Mohd Shafri ◽  
Aidi Hizami Alias ◽  
Shaiful Jahari Hashim

Automatic building extraction has been applied in many domains. It is also a challenging problem because of the complex scenes and multiscale. Deep learning algorithms, especially fully convolutional neural networks (FCNs), have shown robust feature extraction ability than traditional remote sensing data processing methods. However, hierarchical features from encoders with a fixed receptive field perform weak ability to obtain global semantic information. Local features in multiscale subregions cannot construct contextual interdependence and correlation, especially for large-scale building areas, which probably causes fragmentary extraction results due to intra-class feature variability. In addition, low-level features have accurate and fine-grained spatial information for tiny building structures but lack refinement and selection, and the semantic gap of across-level features is not conducive to feature fusion. To address the above problems, this paper proposes an FCN framework based on the residual network and provides the training pattern for multi-modal data combining the advantage of high-resolution aerial images and LiDAR data for building extraction. Two novel modules have been proposed for the optimization and integration of multiscale and across-level features. In particular, a multiscale context optimization module is designed to adaptively generate the feature representations for different subregions and effectively aggregate global context. A semantic guided spatial attention mechanism is introduced to refine shallow features and alleviate the semantic gap. Finally, hierarchical features are fused via the feature pyramid network. Compared with other state-of-the-art methods, experimental results demonstrate superior performance with 93.19 IoU, 97.56 OA on WHU datasets and 94.72 IoU, 97.84 OA on the Boston dataset, which shows that the proposed network can improve accuracy and achieve better performance for building extraction.


Sign in / Sign up

Export Citation Format

Share Document