A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data

Peng Zhang; Peijun Du; Cong Lin; Xin Wang; Erzhu Li; Zhaohui Xue; Xuyu Bai

doi:10.3390/rs12223764

A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data

Remote Sensing ◽

10.3390/rs12223764 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3764

Author(s):

Peng Zhang ◽

Peijun Du ◽

Cong Lin ◽

Xin Wang ◽

Erzhu Li ◽

...

Keyword(s):

High Resolution ◽

Multimodal Fusion ◽

Surface Model ◽

Lidar Data ◽

Building Extraction ◽

High Resolution Imagery ◽

Combining Data ◽

Multimodal Features ◽

Multimodal Data Fusion ◽

The Individual

Automated extraction of buildings from earth observation (EO) data has long been a fundamental but challenging research topic. Combining data from different modalities (e.g., high-resolution imagery (HRI) and light detection and ranging (LiDAR) data) has shown great potential in building extraction. Recent studies have examined the role that deep learning (DL) could play in both multimodal data fusion and urban object extraction. However, DL-based multimodal fusion networks may encounter the following limitations: (1) the individual modal and cross-modal features, which we consider both useful and important for final prediction, cannot be sufficiently learned and utilized and (2) the multimodal features are fused by a simple summation or concatenation, which appears ambiguous in selecting cross-modal complementary information. In this paper, we address these two limitations by proposing a hybrid attention-aware fusion network (HAFNet) for building extraction. It consists of RGB-specific, digital surface model (DSM)-specific, and cross-modal streams to sufficiently learn and utilize both individual modal and cross-modal features. Furthermore, an attention-aware multimodal fusion block (Att-MFBlock) was introduced to overcome the fusion problem by adaptively selecting and combining complementary features from each modality. Extensive experiments conducted on two publicly available datasets demonstrated the effectiveness of the proposed HAFNet for building extraction.

Download Full-text

Multiscale Semantic Feature Optimization and Fusion Network for Building Extraction Using High-Resolution Aerial Images and LiDAR Data

Remote Sensing ◽

10.3390/rs13132473 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2473

Author(s):

Qinglie Yuan ◽

Helmi Zulhaidi Mohd Shafri ◽

Aidi Hizami Alias ◽

Shaiful Jahari Hashim

Keyword(s):

High Resolution ◽

Large Scale ◽

Spatial Information ◽

Feature Fusion ◽

Aerial Images ◽

Semantic Gap ◽

Superior Performance ◽

Lidar Data ◽

Building Extraction ◽

Hierarchical Features

Automatic building extraction has been applied in many domains. It is also a challenging problem because of the complex scenes and multiscale. Deep learning algorithms, especially fully convolutional neural networks (FCNs), have shown robust feature extraction ability than traditional remote sensing data processing methods. However, hierarchical features from encoders with a fixed receptive field perform weak ability to obtain global semantic information. Local features in multiscale subregions cannot construct contextual interdependence and correlation, especially for large-scale building areas, which probably causes fragmentary extraction results due to intra-class feature variability. In addition, low-level features have accurate and fine-grained spatial information for tiny building structures but lack refinement and selection, and the semantic gap of across-level features is not conducive to feature fusion. To address the above problems, this paper proposes an FCN framework based on the residual network and provides the training pattern for multi-modal data combining the advantage of high-resolution aerial images and LiDAR data for building extraction. Two novel modules have been proposed for the optimization and integration of multiscale and across-level features. In particular, a multiscale context optimization module is designed to adaptively generate the feature representations for different subregions and effectively aggregate global context. A semantic guided spatial attention mechanism is introduced to refine shallow features and alleviate the semantic gap. Finally, hierarchical features are fused via the feature pyramid network. Compared with other state-of-the-art methods, experimental results demonstrate superior performance with 93.19 IoU, 97.56 OA on WHU datasets and 94.72 IoU, 97.84 OA on the Boston dataset, which shows that the proposed network can improve accuracy and achieve better performance for building extraction.

Download Full-text

Erratum to ‘Hierarchical object oriented classification using very high resolution imagery and LIDAR data over urban areas’

Advances in Space Research ◽

10.1016/j.asr.2012.01.003 ◽

2012 ◽

Vol 49 (6) ◽

pp. 1112 ◽

Cited By ~ 2

Author(s):

Yunhao Chen ◽

Wei Su ◽

Jing Li ◽

Zhongping Sun

Keyword(s):

High Resolution ◽

Urban Areas ◽

Object Oriented ◽

Lidar Data ◽

High Resolution Imagery ◽

Very High Resolution Imagery ◽

Very High

Download Full-text

Building Extraction in Very High Resolution Imagery by Dense-Attention Networks

Remote Sensing ◽

10.3390/rs10111768 ◽

2018 ◽

Vol 10 (11) ◽

pp. 1768 ◽

Cited By ~ 24

Author(s):

Hui Yang ◽

Penghai Wu ◽

Xuedong Yao ◽

Yanlan Wu ◽

Biao Wang ◽

...

Keyword(s):

Deep Learning ◽

High Resolution ◽

Building Extraction ◽

Learning Networks ◽

Feature Maps ◽

Low Level ◽

High Resolution Imagery ◽

Very High Resolution Imagery ◽

High Level ◽

Very High

Building extraction from very high resolution (VHR) imagery plays an important role in urban planning, disaster management, navigation, updating geographic databases, and several other geospatial applications. Compared with the traditional building extraction approaches, deep learning networks have recently shown outstanding performance in this task by using both high-level and low-level feature maps. However, it is difficult to utilize different level features rationally with the present deep learning networks. To tackle this problem, a novel network based on DenseNets and the attention mechanism was proposed, called the dense-attention network (DAN). The DAN contains an encoder part and a decoder part which are separately composed of lightweight DenseNets and a spatial attention fusion module. The proposed encoder–decoder architecture can strengthen feature propagation and effectively bring higher-level feature information to suppress the low-level feature and noises. Experimental results based on public international society for photogrammetry and remote sensing (ISPRS) datasets with only red–green–blue (RGB) images demonstrated that the proposed DAN achieved a higher score (96.16% overall accuracy (OA), 92.56% F1 score, 90.56% mean intersection over union (MIOU), less training and response time and higher-quality value) when compared with other deep learning methods.

Download Full-text

Automatic building extraction from high-resolution aerial images and LiDAR data using gated residual refinement network

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2019.02.019 ◽

2019 ◽

Vol 151 ◽

pp. 91-105 ◽

Cited By ~ 29

Author(s):

Jianfeng Huang ◽

Xinchang Zhang ◽

Qinchuan Xin ◽

Ying Sun ◽

Pengcheng Zhang

Keyword(s):

High Resolution ◽

Aerial Images ◽

Lidar Data ◽

Building Extraction

Download Full-text

Building extraction using lidar data and very high resolution image over complex urban area

2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS ◽

10.1109/igarss.2013.6723773 ◽

2013 ◽

Cited By ~ 4

Author(s):

Peijun Li ◽

Shasha Jiang ◽

Xue Wang ◽

Jun Zhang

Keyword(s):

High Resolution ◽

Urban Area ◽

Lidar Data ◽

Building Extraction ◽

Resolution Image ◽

High Resolution Image ◽

Very High

Download Full-text

Interactions of multiple disturbances in shaping boreal forest dynamics: a spatially explicit analysis using multi-temporal lidar data and high-resolution imagery

Journal of Ecology ◽

10.1111/j.1365-2745.2010.01643.x ◽

2010 ◽

Vol 98 (3) ◽

pp. 526-539 ◽

Cited By ~ 31

Author(s):

Udayalakshmi Vepakomma ◽

Daniel Kneeshaw ◽

Benoit St-Onge

Keyword(s):

High Resolution ◽

Boreal Forest ◽

Forest Dynamics ◽

Spatially Explicit ◽

Lidar Data ◽

Multiple Disturbances ◽

High Resolution Imagery ◽

Multi Temporal ◽

Explicit Analysis

Download Full-text

Integration of high-resolution imagery and LiDAR data for object-based classification of urban area

Geocarto International ◽

10.1080/10106049.2013.784365 ◽

2013 ◽

Vol 29 (4) ◽

pp. 418-432 ◽

Cited By ~ 3

Author(s):

A. Mehta ◽

O. Dikshit ◽

K. Venkataramani

Keyword(s):

High Resolution ◽

Urban Area ◽

Lidar Data ◽

High Resolution Imagery ◽

Object Based

Download Full-text

Urban Building Extraction and Modeling Using GF-7 DLC and MUX Images

Remote Sensing ◽

10.3390/rs13173414 ◽

2021 ◽

Vol 13 (17) ◽

pp. 3414

Author(s):

Heng Luo ◽

Biao He ◽

Renzhong Guo ◽

Weixi Wang ◽

Xi Kuai ◽

...

Keyword(s):

High Resolution ◽

3D Modeling ◽

3D Visualization ◽

Smart Cities ◽

Three Dimensions ◽

Surface Model ◽

Building Extraction ◽

Double Line ◽

Building Model ◽

Urban Scene

Urban modeling and visualization are highly useful in the development of smart cities. Buildings are the most prominent features in the urban environment, and are necessary for urban decision support; thus, buildings should be modeled effectively and efficiently in three dimensions (3D). In this study, with the help of Gaofen-7 (GF-7) high-resolution stereo mapping satellite double-line camera (DLC) images and multispectral (MUX) images, the boundary of a building is segmented via a multilevel features fusion network (MFFN). A digital surface model (DSM) is generated to obtain the elevation of buildings. The building vector with height information is processed using a 3D modeling tool to create a white building model. The building model, DSM, and multispectral fused image are then imported into the Unreal Engine 4 (UE4) to complete the urban scene level, vividly rendered with environmental effects for urban visualization. The results of this study show that high accuracy of 95.29% is achieved in building extraction using our proposed method. Based on the extracted building vector and elevation information from the DSM, building 3D models can be efficiently created in Level of Details 1 (LOD1). Finally, the urban scene is produced for realistic 3D visualization. This study shows that high-resolution stereo mapping satellite images are useful in 3D modeling for urban buildings and can support the generation and visualization of urban scenes in a large area for different applications.

Download Full-text

An Improved Method for Impervious Surface Mapping Incorporating LiDAR Data and High-Resolution Imagery at Different Acquisition Times

Remote Sensing ◽

10.3390/rs10091349 ◽

2018 ◽

Vol 10 (9) ◽

pp. 1349 ◽

Cited By ~ 12

Author(s):

Hui Luo ◽

Le Wang ◽

Chen Wu ◽

Lei Zhang

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Remote Sensing Data ◽

Impervious Surface ◽

Sensor Data ◽

Lidar Data ◽

Landscape Changes ◽

Surface Mapping ◽

High Resolution Imagery ◽

High Resolution Images

Impervious surface mapping incorporating high-resolution remote sensing imagery has continued to attract increasing interest, as it can provide detailed information about urban structure and distribution. Previous studies have suggested that the combination of LiDAR data and high-resolution imagery for impervious surface mapping yields better performance than the use of high-resolution imagery alone. However, due to LiDAR data’s high cost of acquisition, it is difficult to obtain LiDAR data that was acquired at the same time as the high-resolution imagery in order to conduct impervious surface mapping by multi-sensor remote sensing data. Consequently, the occurrence of real landscape changes between multi-sensor remote sensing data sets with different acquisition times results in misclassification errors in impervious surface mapping. This issue has generally been neglected in previous works. Furthermore, observation differences that were generated from multi-sensor data—including the problems of misregistration, missing data in LiDAR data, and shadow in high-resolution images—also present obstacles to achieving the final mapping result in the fusion of LiDAR data and high-resolution images. In order to resolve these issues, we propose an improved impervious surface-mapping method incorporating both LiDAR data and high-resolution imagery with different acquisition times that consider real landscape changes and observation differences. In the proposed method, multi-sensor change detection by supervised multivariate alteration detection (MAD) is employed to identify the changed areas and mis-registered areas. The no-data areas in the LiDAR data and the shadow areas in the high-resolution image are extracted via independent classification based on the corresponding single-sensor data. Finally, an object-based post-classification fusion is proposed that takes advantage of both independent classification results while using single-sensor data and the joint classification result using stacked multi-sensor data. The impervious surface map is subsequently obtained by combining the landscape classes in the accurate classification map. Experiments covering the study site in Buffalo, NY, USA demonstrate that our method can accurately detect landscape changes and unambiguously improve the performance of impervious surface mapping.

Download Full-text

Automatic detection and agronomic characterization of olive groves using high-resolution imagery and LIDAR data

10.1117/12.2065952 ◽

2014 ◽

Author(s):

T. Caruso ◽

J. Rühl ◽

R. Sciortino ◽

F. P. Marra ◽

G. La Scalia

Keyword(s):

High Resolution ◽

Automatic Detection ◽

Lidar Data ◽

High Resolution Imagery ◽

Olive Groves

Download Full-text