HsgNet: A Road Extraction Network Based on Global Perception of High-Order Spatial Information

Yan Xie; Fang Miao; Kai Zhou; Jing Peng

doi:10.3390/ijgi8120571

HsgNet: A Road Extraction Network Based on Global Perception of High-Order Spatial Information

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8120571 ◽

2019 ◽

Vol 8 (12) ◽

pp. 571 ◽

Cited By ~ 2

Author(s):

Yan Xie ◽

Fang Miao ◽

Kai Zhou ◽

Jing Peng

Keyword(s):

Semantic Information ◽

Spatial Information ◽

Feature Fusion ◽

High Order ◽

Local Context ◽

Model Parameters ◽

Road Extraction ◽

Long Distance ◽

Global Perception ◽

Intermediate Output

Road extraction is a unique and difficult problem in the field of semantic segmentation because roads have attributes such as slenderness, long span, complexity, and topological connectivity, etc. Therefore, we propose a novel road extraction network, abbreviated HsgNet, based on high-order spatial information global perception network using bilinear pooling. HsgNet, taking the efficient LinkNet as its basic architecture, embeds a Middle Block between the Encoder and Decoder. The Middle Block learns to preserve global-context semantic information, long-distance spatial information and relationships, and different feature channels’ information and dependencies. It is different from other road segmentation methods which lose spatial information, such as those using dilated convolution and multiscale feature fusion to record local-context semantic information. The Middle Block consists of three important steps: (1) forming a feature resource pool to gather high-order global spatial information; (2) selecting a feature weight distribution, enabling each pixel position to obtain complementary features according to its own needs; and (3) inversely mapping the intermediate output feature encoding to the size of the input image by expanding the number of channels of the intermediate output feature. We compared multiple road extraction methods on two open datasets, SpaceNet and DeepGlobe. The results show that compared to the efficient road extraction model D-LinkNet, our model has fewer parameters and better performance: we achieved higher mean intersection over union (71.1%), and the model parameters were reduced in number by about 1/4.

Download Full-text

Cascaded Residual Attention Enhanced Road Extraction from Remote Sensing Images

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi11010009 ◽

2021 ◽

Vol 11 (1) ◽

pp. 9

Author(s):

Shengfu Li ◽

Cheng Liao ◽

Yulin Ding ◽

Han Hu ◽

Yang Jia ◽

...

Keyword(s):

Remote Sensing ◽

Spatial Information ◽

Semantic Segmentation ◽

Road Extraction ◽

Remote Sensing Images ◽

Long Distance ◽

Features Fusion ◽

Multi Scale ◽

Boundary Recognition ◽

Benchmark Datasets

Efficient and accurate road extraction from remote sensing imagery is important for applications related to navigation and Geographic Information System updating. Existing data-driven methods based on semantic segmentation recognize roads from images pixel by pixel, which generally uses only local spatial information and causes issues of discontinuous extraction and jagged boundary recognition. To address these problems, we propose a cascaded attention-enhanced architecture to extract boundary-refined roads from remote sensing images. Our proposed architecture uses spatial attention residual blocks on multi-scale features to capture long-distance relations and introduce channel attention layers to optimize the multi-scale features fusion. Furthermore, a lightweight encoder-decoder network is connected to adaptively optimize the boundaries of the extracted roads. Our experiments showed that the proposed method outperformed existing methods and achieved state-of-the-art results on the Massachusetts dataset. In addition, our method achieved competitive results on more recent benchmark datasets, e.g., the DeepGlobe and the Huawei Cloud road extraction challenge.

Download Full-text

BSIRNet: A Road Extraction Network with Bidirectional Spatial Information Reasoning

Journal of Sensors ◽

10.1155/2022/6391238 ◽

2022 ◽

Vol 2022 ◽

pp. 1-11

Author(s):

Hai Tan ◽

Hao Xu ◽

Jiguang Dai

Keyword(s):

Spatial Information ◽

Feature Fusion ◽

Extraction Methods ◽

Automatic Extraction ◽

Processing Unit ◽

Road Extraction ◽

Remote Sensing Images ◽

The Public ◽

Automatic Navigation ◽

Neural Network Structure

Automatic extraction of road information from remote sensing images is widely used in many fields, such as urban planning and automatic navigation. However, due to interference from noise and occlusion, the existing road extraction methods can easily lead to road discontinuity. To solve this problem, a road extraction network with bidirectional spatial information reasoning (BSIRNet) is proposed, in which neighbourhood feature fusion is used to capture spatial context dependencies and expand the receptive field, and an information processing unit with a recurrent neural network structure is used to capture channel dependencies. BSIRNet enhances the connectivity of road information through spatial information reasoning. Using the public Massachusetts road dataset and Wuhan University road dataset, the superiority of the proposed method is verified by comparing its results with those of other models.

Download Full-text

Multilevel feature fusion dilated convolutional network for semantic segmentation

International Journal of Advanced Robotic Systems ◽

10.1177/17298814211007665 ◽

2021 ◽

Vol 18 (2) ◽

pp. 172988142110076

Author(s):

Tao Ku ◽

Qirui Yang ◽

Hao Zhang

Keyword(s):

Large Scale ◽

Semantic Information ◽

Spatial Information ◽

Feature Fusion ◽

Scene Perception ◽

Semantic Segmentation ◽

Field Size ◽

Data Set ◽

Dilated Convolution ◽

High Level

Recently, convolutional neural network (CNN) has led to significant improvement in the field of computer vision, especially the improvement of the accuracy and speed of semantic segmentation tasks, which greatly improved robot scene perception. In this article, we propose a multilevel feature fusion dilated convolution network (Refine-DeepLab). By improving the space pyramid pooling structure, we propose a multiscale hybrid dilated convolution module, which captures the rich context information and effectively alleviates the contradiction between the receptive field size and the dilated convolution operation. At the same time, the high-level semantic information and low-level semantic information obtained through multi-level and multi-scale feature extraction can effectively improve the capture of global information and improve the performance of large-scale target segmentation. The encoder–decoder gradually recovers spatial information while capturing high-level semantic information, resulting in sharper object boundaries. Extensive experiments verify the effectiveness of our proposed Refine-DeepLab model, evaluate our approaches thoroughly on the PASCAL VOC 2012 data set without MS COCO data set pretraining, and achieve a state-of-art result of 81.73% mean interaction-over-union in the validate set.

Download Full-text

A Multivariate High-Order Markov Model for the Income Estimation of a Wind Farm

Energies ◽

10.3390/en14020388 ◽

2021 ◽

Vol 14 (2) ◽

pp. 388

Author(s):

Riccardo De Blasis ◽

Giovanni Batista Masala ◽

Filippo Petroni

Keyword(s):

Markov Model ◽

Electricity Market ◽

Wind Farm ◽

Central Italy ◽

High Order ◽

Distribution Model ◽

Spot Price ◽

Model Parameters ◽

Price Series ◽

Cross Correlations

The energy produced by a wind farm in a given location and its associated income depends both on the wind characteristics in that location—i.e., speed and direction—and the dynamics of the electricity spot price. Because of the evidence of cross-correlations between wind speed, direction and price series and their lagged series, we aim to assess the income of a hypothetical wind farm located in central Italy when all interactions are considered. To model these cross and auto-correlations efficiently, we apply a high-order multivariate Markov model which includes dependencies from each time series and from a certain level of past values. Besides this, we used the Raftery Mixture Transition Distribution model (MTD) to reduce the number of parameters to get a more parsimonious model. Using data from the MERRA-2 project and from the electricity market in Italy, we estimate the model parameters and validate them through a Monte Carlo simulation. The results show that the simulated income faithfully reproduces the empirical income and that the multivariate model also closely reproduces the cross-correlations between the variables. Therefore, the model can be used to predict the income generated by a wind farm.

Download Full-text

Remote Sensing Road Extraction by Road Segmentation Network

Applied Sciences ◽

10.3390/app11115050 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5050

Author(s):

Jiahai Tan ◽

Ming Gao ◽

Kai Yang ◽

Tao Duan

Keyword(s):

Remote Sensing ◽

Attention Mechanism ◽

Context Information ◽

Road Extraction ◽

Remote Sensing Images ◽

Long Distance ◽

The Road ◽

Road Segmentation ◽

Context Characteristics

Road extraction from remote sensing images has attracted much attention in geospatial applications. However, the existing methods do not accurately identify the connectivity of the road. The identification of the road pixels may be interfered with by the abundant ground such as buildings, trees, and shadows. The objective of this paper is to enhance context and strip features of the road by designing UNet-like architecture. The overall method first enhances the context characteristics in the segmentation step and then maintains the stripe characteristics in a refinement step. The segmentation step exploits an attention mechanism to enhance the context information between the adjacent layers. To obtain the strip features of the road, the refinement step introduces the strip pooling in a refinement network to restore the long distance dependent information of the road. Extensive comparative experiments demonstrate that the proposed method outperforms other methods, achieving an overall accuracy of 98.25% on the DeepGlobe dataset, and 97.68% on the Massachusetts dataset.

Download Full-text

A Fast and Lightweight Method with Feature Fusion and Multi-Context for Face Detection

Future Internet ◽

10.3390/fi10080080 ◽

2018 ◽

Vol 10 (8) ◽

pp. 80

Author(s):

Lei Zhang ◽

Xiaoli Zhi

Keyword(s):

Face Detection ◽

Graphics Processing Units ◽

High Performance ◽

Feature Fusion ◽

Local Context ◽

Data Set ◽

Global Context ◽

Detection Algorithms ◽

Multi Scale ◽

Benchmark Datasets

Convolutional neural networks (CNN for short) have made great progress in face detection. They mostly take computation intensive networks as the backbone in order to obtain high precision, and they cannot get a good detection speed without the support of high-performance GPUs (Graphics Processing Units). This limits CNN-based face detection algorithms in real applications, especially in some speed dependent ones. To alleviate this problem, we propose a lightweight face detector in this paper, which takes a fast residual network as backbone. Our method can run fast even on cheap and ordinary GPUs. To guarantee its detection precision, multi-scale features and multi-context are fully exploited in efficient ways. Specifically, feature fusion is used to obtain semantic strongly multi-scale features firstly. Then multi-context including both local and global context is added to these multi-scale features without extra computational burden. The local context is added through a depthwise separable convolution based approach, and the global context by a simple global average pooling way. Experimental results show that our method can run at about 110 fps on VGA (Video Graphics Array)-resolution images, while still maintaining competitive precision on WIDER FACE and FDDB (Face Detection Data Set and Benchmark) datasets as compared with its state-of-the-art counterparts.

Download Full-text

Multiscale Semantic Feature Optimization and Fusion Network for Building Extraction Using High-Resolution Aerial Images and LiDAR Data

Remote Sensing ◽

10.3390/rs13132473 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2473

Author(s):

Qinglie Yuan ◽

Helmi Zulhaidi Mohd Shafri ◽

Aidi Hizami Alias ◽

Shaiful Jahari Hashim

Keyword(s):

High Resolution ◽

Large Scale ◽

Spatial Information ◽

Feature Fusion ◽

Aerial Images ◽

Semantic Gap ◽

Superior Performance ◽

Lidar Data ◽

Building Extraction ◽

Hierarchical Features

Automatic building extraction has been applied in many domains. It is also a challenging problem because of the complex scenes and multiscale. Deep learning algorithms, especially fully convolutional neural networks (FCNs), have shown robust feature extraction ability than traditional remote sensing data processing methods. However, hierarchical features from encoders with a fixed receptive field perform weak ability to obtain global semantic information. Local features in multiscale subregions cannot construct contextual interdependence and correlation, especially for large-scale building areas, which probably causes fragmentary extraction results due to intra-class feature variability. In addition, low-level features have accurate and fine-grained spatial information for tiny building structures but lack refinement and selection, and the semantic gap of across-level features is not conducive to feature fusion. To address the above problems, this paper proposes an FCN framework based on the residual network and provides the training pattern for multi-modal data combining the advantage of high-resolution aerial images and LiDAR data for building extraction. Two novel modules have been proposed for the optimization and integration of multiscale and across-level features. In particular, a multiscale context optimization module is designed to adaptively generate the feature representations for different subregions and effectively aggregate global context. A semantic guided spatial attention mechanism is introduced to refine shallow features and alleviate the semantic gap. Finally, hierarchical features are fused via the feature pyramid network. Compared with other state-of-the-art methods, experimental results demonstrate superior performance with 93.19 IoU, 97.56 OA on WHU datasets and 94.72 IoU, 97.84 OA on the Boston dataset, which shows that the proposed network can improve accuracy and achieve better performance for building extraction.

Download Full-text

DFFAN: Dual Function Feature Aggregation Network for Semantic Segmentation of Land Cover

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10030125 ◽

2021 ◽

Vol 10 (3) ◽

pp. 125

Author(s):

Junqing Huang ◽

Liguo Weng ◽

Bingyu Chen ◽

Min Xia

Keyword(s):

Remote Sensing ◽

Land Cover ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Dual Function ◽

Context Information ◽

Remote Sensing Images ◽

Feature Aggregation ◽

Image Context

Analyzing land cover using remote sensing images has broad prospects, the precise segmentation of land cover is the key to the application of this technology. Nowadays, the Convolution Neural Network (CNN) is widely used in many image semantic segmentation tasks. However, existing CNN models often exhibit poor generalization ability and low segmentation accuracy when dealing with land cover segmentation tasks. To solve this problem, this paper proposes Dual Function Feature Aggregation Network (DFFAN). This method combines image context information, gathers image spatial information, and extracts and fuses features. DFFAN uses residual neural networks as backbone to obtain different dimensional feature information of remote sensing images through multiple downsamplings. This work designs Affinity Matrix Module (AMM) to obtain the context of each feature map and proposes Boundary Feature Fusion Module (BFF) to fuse the context information and spatial information of an image to determine the location distribution of each image’s category. Compared with existing methods, the proposed method is significantly improved in accuracy. Its mean intersection over union (MIoU) on the LandCover dataset reaches 84.81%.

Download Full-text

MR-InpaintNet: Toward Deep Multi-Resolution Learning for Progressive Image Inpainting

10.36227/techrxiv.16641241 ◽

2021 ◽

Author(s):

Huan Zhang ◽

Zhao Zhang ◽

Haijun Zhang ◽

Yi Yang ◽

Shuicheng Yan ◽

...

Keyword(s):

Deep Learning ◽

High Resolution ◽

Semantic Information ◽

Feature Fusion ◽

Image Inpainting ◽

Feature Learning ◽

Low Resolution ◽

Resolution Image ◽

Texture Information ◽

Multiple Resolutions

<div>Deep learning based image inpainting methods have improved the performance greatly due to powerful representation ability of deep learning. However, current deep inpainting methods still tend to produce unreasonable structure and blurry texture, implying that image inpainting is still a challenging topic due to the ill-posed property of the task. To address these issues, we propose a novel deep multi-resolution learning-based progressive image inpainting method, termed MR-InpaintNet, which takes the damaged images of different resolutions as input and then fuses the multi-resolution features for repairing the damaged images. The idea is motivated by the fact that images of different resolutions can provide different levels of feature information. Specifically, the low-resolution image provides strong semantic information and the high-resolution image offers detailed texture information. The middle-resolution image can be used to reduce the gap between low-resolution and high-resolution images, which can further refine the inpainting result. To fuse and improve the multi-resolution features, a novel multi-resolution feature learning (MRFL) process is designed, which is consisted of a multi-resolution feature fusion (MRFF) module, an adaptive feature enhancement (AFE) module and a memory enhanced mechanism (MEM) module for information preservation. Then, the refined multi-resolution features contain both rich semantic information and detailed texture information from multiple resolutions. We further handle the refined multiresolution features by the decoder to obtain the recovered image. Extensive experiments on the Paris Street View, Places2 and CelebA-HQ datasets demonstrate that our proposed MRInpaintNet can effectively recover the textures and structures, and performs favorably against state-of-the-art methods.</div>

Download Full-text

Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs

Entropy ◽

10.3390/e23111507 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1507

Author(s):

Feiyu Zhang ◽

Luyang Zhang ◽

Hongxiang Chen ◽

Jiangjian Xie

Keyword(s):

Species Identification ◽

Feature Fusion ◽

Bird Species ◽

The Other ◽

Identification Accuracy ◽

Mean Average Precision ◽

Training Dataset ◽

Model Parameters ◽

Average Precision ◽

Fusion Mode

Deep convolutional neural networks (DCNNs) have achieved breakthrough performance on bird species identification using a spectrogram of bird vocalization. Aiming at the imbalance of the bird vocalization dataset, a single feature identification model (SFIM) with residual blocks and modified, weighted, cross-entropy function was proposed. To further improve the identification accuracy, two multi-channel fusion methods were built with three SFIMs. One of these fused the outputs of the feature extraction parts of three SFIMs (feature fusion mode), the other fused the outputs of the classifiers of three SFIMs (result fusion mode). The SFIMs were trained with three different kinds of spectrograms, which were calculated through short-time Fourier transform, mel-frequency cepstrum transform and chirplet transform, respectively. To overcome the shortage of the huge number of trainable model parameters, transfer learning was used in the multi-channel models. Using our own vocalization dataset as a sample set, it is found that the result fusion mode model outperforms the other proposed models, the best mean average precision (MAP) reaches 0.914. Choosing three durations of spectrograms, 100 ms, 300 ms and 500 ms for comparison, the results reveal that the 300 ms duration is the best for our own dataset. The duration is suggested to be determined based on the duration distribution of bird syllables. As for the performance with the training dataset of BirdCLEF2019, the highest classification mean average precision (cmAP) reached 0.135, which means the proposed model has certain generalization ability.

Download Full-text