scholarly journals A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data

2020 ◽  
Vol 12 (22) ◽  
pp. 3764
Author(s):  
Peng Zhang ◽  
Peijun Du ◽  
Cong Lin ◽  
Xin Wang ◽  
Erzhu Li ◽  
...  

Automated extraction of buildings from earth observation (EO) data has long been a fundamental but challenging research topic. Combining data from different modalities (e.g., high-resolution imagery (HRI) and light detection and ranging (LiDAR) data) has shown great potential in building extraction. Recent studies have examined the role that deep learning (DL) could play in both multimodal data fusion and urban object extraction. However, DL-based multimodal fusion networks may encounter the following limitations: (1) the individual modal and cross-modal features, which we consider both useful and important for final prediction, cannot be sufficiently learned and utilized and (2) the multimodal features are fused by a simple summation or concatenation, which appears ambiguous in selecting cross-modal complementary information. In this paper, we address these two limitations by proposing a hybrid attention-aware fusion network (HAFNet) for building extraction. It consists of RGB-specific, digital surface model (DSM)-specific, and cross-modal streams to sufficiently learn and utilize both individual modal and cross-modal features. Furthermore, an attention-aware multimodal fusion block (Att-MFBlock) was introduced to overcome the fusion problem by adaptively selecting and combining complementary features from each modality. Extensive experiments conducted on two publicly available datasets demonstrated the effectiveness of the proposed HAFNet for building extraction.

2021 ◽  
Vol 13 (13) ◽  
pp. 2473
Author(s):  
Qinglie Yuan ◽  
Helmi Zulhaidi Mohd Shafri ◽  
Aidi Hizami Alias ◽  
Shaiful Jahari Hashim

Automatic building extraction has been applied in many domains. It is also a challenging problem because of the complex scenes and multiscale. Deep learning algorithms, especially fully convolutional neural networks (FCNs), have shown robust feature extraction ability than traditional remote sensing data processing methods. However, hierarchical features from encoders with a fixed receptive field perform weak ability to obtain global semantic information. Local features in multiscale subregions cannot construct contextual interdependence and correlation, especially for large-scale building areas, which probably causes fragmentary extraction results due to intra-class feature variability. In addition, low-level features have accurate and fine-grained spatial information for tiny building structures but lack refinement and selection, and the semantic gap of across-level features is not conducive to feature fusion. To address the above problems, this paper proposes an FCN framework based on the residual network and provides the training pattern for multi-modal data combining the advantage of high-resolution aerial images and LiDAR data for building extraction. Two novel modules have been proposed for the optimization and integration of multiscale and across-level features. In particular, a multiscale context optimization module is designed to adaptively generate the feature representations for different subregions and effectively aggregate global context. A semantic guided spatial attention mechanism is introduced to refine shallow features and alleviate the semantic gap. Finally, hierarchical features are fused via the feature pyramid network. Compared with other state-of-the-art methods, experimental results demonstrate superior performance with 93.19 IoU, 97.56 OA on WHU datasets and 94.72 IoU, 97.84 OA on the Boston dataset, which shows that the proposed network can improve accuracy and achieve better performance for building extraction.


2018 ◽  
Vol 10 (11) ◽  
pp. 1768 ◽  
Author(s):  
Hui Yang ◽  
Penghai Wu ◽  
Xuedong Yao ◽  
Yanlan Wu ◽  
Biao Wang ◽  
...  

Building extraction from very high resolution (VHR) imagery plays an important role in urban planning, disaster management, navigation, updating geographic databases, and several other geospatial applications. Compared with the traditional building extraction approaches, deep learning networks have recently shown outstanding performance in this task by using both high-level and low-level feature maps. However, it is difficult to utilize different level features rationally with the present deep learning networks. To tackle this problem, a novel network based on DenseNets and the attention mechanism was proposed, called the dense-attention network (DAN). The DAN contains an encoder part and a decoder part which are separately composed of lightweight DenseNets and a spatial attention fusion module. The proposed encoder–decoder architecture can strengthen feature propagation and effectively bring higher-level feature information to suppress the low-level feature and noises. Experimental results based on public international society for photogrammetry and remote sensing (ISPRS) datasets with only red–green–blue (RGB) images demonstrated that the proposed DAN achieved a higher score (96.16% overall accuracy (OA), 92.56% F1 score, 90.56% mean intersection over union (MIOU), less training and response time and higher-quality value) when compared with other deep learning methods.


2021 ◽  
Vol 13 (17) ◽  
pp. 3414
Author(s):  
Heng Luo ◽  
Biao He ◽  
Renzhong Guo ◽  
Weixi Wang ◽  
Xi Kuai ◽  
...  

Urban modeling and visualization are highly useful in the development of smart cities. Buildings are the most prominent features in the urban environment, and are necessary for urban decision support; thus, buildings should be modeled effectively and efficiently in three dimensions (3D). In this study, with the help of Gaofen-7 (GF-7) high-resolution stereo mapping satellite double-line camera (DLC) images and multispectral (MUX) images, the boundary of a building is segmented via a multilevel features fusion network (MFFN). A digital surface model (DSM) is generated to obtain the elevation of buildings. The building vector with height information is processed using a 3D modeling tool to create a white building model. The building model, DSM, and multispectral fused image are then imported into the Unreal Engine 4 (UE4) to complete the urban scene level, vividly rendered with environmental effects for urban visualization. The results of this study show that high accuracy of 95.29% is achieved in building extraction using our proposed method. Based on the extracted building vector and elevation information from the DSM, building 3D models can be efficiently created in Level of Details 1 (LOD1). Finally, the urban scene is produced for realistic 3D visualization. This study shows that high-resolution stereo mapping satellite images are useful in 3D modeling for urban buildings and can support the generation and visualization of urban scenes in a large area for different applications.


2018 ◽  
Vol 10 (9) ◽  
pp. 1349 ◽  
Author(s):  
Hui Luo ◽  
Le Wang ◽  
Chen Wu ◽  
Lei Zhang

Impervious surface mapping incorporating high-resolution remote sensing imagery has continued to attract increasing interest, as it can provide detailed information about urban structure and distribution. Previous studies have suggested that the combination of LiDAR data and high-resolution imagery for impervious surface mapping yields better performance than the use of high-resolution imagery alone. However, due to LiDAR data’s high cost of acquisition, it is difficult to obtain LiDAR data that was acquired at the same time as the high-resolution imagery in order to conduct impervious surface mapping by multi-sensor remote sensing data. Consequently, the occurrence of real landscape changes between multi-sensor remote sensing data sets with different acquisition times results in misclassification errors in impervious surface mapping. This issue has generally been neglected in previous works. Furthermore, observation differences that were generated from multi-sensor data—including the problems of misregistration, missing data in LiDAR data, and shadow in high-resolution images—also present obstacles to achieving the final mapping result in the fusion of LiDAR data and high-resolution images. In order to resolve these issues, we propose an improved impervious surface-mapping method incorporating both LiDAR data and high-resolution imagery with different acquisition times that consider real landscape changes and observation differences. In the proposed method, multi-sensor change detection by supervised multivariate alteration detection (MAD) is employed to identify the changed areas and mis-registered areas. The no-data areas in the LiDAR data and the shadow areas in the high-resolution image are extracted via independent classification based on the corresponding single-sensor data. Finally, an object-based post-classification fusion is proposed that takes advantage of both independent classification results while using single-sensor data and the joint classification result using stacked multi-sensor data. The impervious surface map is subsequently obtained by combining the landscape classes in the accurate classification map. Experiments covering the study site in Buffalo, NY, USA demonstrate that our method can accurately detect landscape changes and unambiguously improve the performance of impervious surface mapping.


Sign in / Sign up

Export Citation Format

Share Document