Inter-Level Feature Balanced Fusion Network for Street Scene Segmentation

Dongqian Li; Cien Fan; Lian Zou; Qi Zuo; Hao Jiang; Yifeng Liu

doi:10.3390/s21237844

Inter-Level Feature Balanced Fusion Network for Street Scene Segmentation

Sensors ◽

10.3390/s21237844 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7844

Author(s):

Dongqian Li ◽

Cien Fan ◽

Lian Zou ◽

Qi Zuo ◽

Hao Jiang ◽

...

Keyword(s):

Network Architecture ◽

Spatial Information ◽

Feature Fusion ◽

Recognition Task ◽

Semantic Segmentation ◽

Spatial Features ◽

Information Stream ◽

Boundary Information ◽

Street Scene ◽

Deep Convolution Network

Semantic segmentation, as a pixel-level recognition task, has been widely used in a variety of practical scenes. Most of the existing methods try to improve the performance of the network by fusing the information of high and low layers. This kind of simple concatenation or element-wise addition will lead to the problem of unbalanced fusion and low utilization of inter-level features. To solve this problem, we propose the Inter-Level Feature Balanced Fusion Network (IFBFNet) to guide the inter-level feature fusion towards a more balanced and effective direction. Our overall network architecture is based on the encoder–decoder architecture. In the encoder, we use a relatively deep convolution network to extract rich semantic information. In the decoder, skip-connections are added to connect and fuse low-level spatial features to restore a clearer boundary expression gradually. We add an inter-level feature balanced fusion module to each skip connection. Additionally, to better capture the boundary information, we added a shallower spatial information stream to supplement more spatial information details. Experiments have proved the effectiveness of our module. Our IFBFNet achieved a competitive performance on the Cityscapes dataset with only finely annotated data used for training and has been greatly improved on the baseline network.

Download Full-text

HRCNet: High-Resolution Context Extraction Network for Semantic Segmentation of Remote Sensing Images

Remote Sensing ◽

10.3390/rs13010071 ◽

2020 ◽

Vol 13 (1) ◽

pp. 71

Author(s):

Zhiyong Xu ◽

Weicun Zhang ◽

Tianxiang Zhang ◽

Jiangyun Li

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

High Resolution ◽

Spatial Information ◽

Semantic Segmentation ◽

Context Information ◽

Remote Sensing Images ◽

Global Context ◽

Boundary Information ◽

Extraction Stage

Semantic segmentation is a significant method in remote sensing image (RSIs) processing and has been widely used in various applications. Conventional convolutional neural network (CNN)-based semantic segmentation methods are likely to lose the spatial information in the feature extraction stage and usually pay little attention to global context information. Moreover, the imbalance of category scale and uncertain boundary information meanwhile exists in RSIs, which also brings a challenging problem to the semantic segmentation task. To overcome these problems, a high-resolution context extraction network (HRCNet) based on a high-resolution network (HRNet) is proposed in this paper. In this approach, the HRNet structure is adopted to keep the spatial information. Moreover, the light-weight dual attention (LDA) module is designed to obtain global context information in the feature extraction stage and the feature enhancement feature pyramid (FEFP) structure is promoted and employed to fuse the contextual information of different scales. In addition, to achieve the boundary information, we design the boundary aware (BA) module combined with the boundary aware loss (BAloss) function. The experimental results evaluated on Potsdam and Vaihingen datasets show that the proposed approach can significantly improve the boundary and segmentation performance up to 92.0% and 92.3% on overall accuracy scores, respectively. As a consequence, it is envisaged that the proposed HRCNet model will be an advantage in remote sensing images segmentation.

Download Full-text

DFFAN: Dual Function Feature Aggregation Network for Semantic Segmentation of Land Cover

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10030125 ◽

2021 ◽

Vol 10 (3) ◽

pp. 125

Author(s):

Junqing Huang ◽

Liguo Weng ◽

Bingyu Chen ◽

Min Xia

Keyword(s):

Remote Sensing ◽

Land Cover ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Dual Function ◽

Context Information ◽

Remote Sensing Images ◽

Feature Aggregation ◽

Image Context

Analyzing land cover using remote sensing images has broad prospects, the precise segmentation of land cover is the key to the application of this technology. Nowadays, the Convolution Neural Network (CNN) is widely used in many image semantic segmentation tasks. However, existing CNN models often exhibit poor generalization ability and low segmentation accuracy when dealing with land cover segmentation tasks. To solve this problem, this paper proposes Dual Function Feature Aggregation Network (DFFAN). This method combines image context information, gathers image spatial information, and extracts and fuses features. DFFAN uses residual neural networks as backbone to obtain different dimensional feature information of remote sensing images through multiple downsamplings. This work designs Affinity Matrix Module (AMM) to obtain the context of each feature map and proposes Boundary Feature Fusion Module (BFF) to fuse the context information and spatial information of an image to determine the location distribution of each image’s category. Compared with existing methods, the proposed method is significantly improved in accuracy. Its mean intersection over union (MIoU) on the LandCover dataset reaches 84.81%.

Download Full-text

Multilevel feature fusion dilated convolutional network for semantic segmentation

International Journal of Advanced Robotic Systems ◽

10.1177/17298814211007665 ◽

2021 ◽

Vol 18 (2) ◽

pp. 172988142110076

Author(s):

Tao Ku ◽

Qirui Yang ◽

Hao Zhang

Keyword(s):

Large Scale ◽

Semantic Information ◽

Spatial Information ◽

Feature Fusion ◽

Scene Perception ◽

Semantic Segmentation ◽

Field Size ◽

Data Set ◽

Dilated Convolution ◽

High Level

Recently, convolutional neural network (CNN) has led to significant improvement in the field of computer vision, especially the improvement of the accuracy and speed of semantic segmentation tasks, which greatly improved robot scene perception. In this article, we propose a multilevel feature fusion dilated convolution network (Refine-DeepLab). By improving the space pyramid pooling structure, we propose a multiscale hybrid dilated convolution module, which captures the rich context information and effectively alleviates the contradiction between the receptive field size and the dilated convolution operation. At the same time, the high-level semantic information and low-level semantic information obtained through multi-level and multi-scale feature extraction can effectively improve the capture of global information and improve the performance of large-scale target segmentation. The encoder–decoder gradually recovers spatial information while capturing high-level semantic information, resulting in sharper object boundaries. Extensive experiments verify the effectiveness of our proposed Refine-DeepLab model, evaluate our approaches thoroughly on the PASCAL VOC 2012 data set without MS COCO data set pretraining, and achieve a state-of-art result of 81.73% mean interaction-over-union in the validate set.

Download Full-text

MS-AFF: A Novel Semantic Segmentation Approach for Buried Object Based on Multi-scale Attentional Feature Fusion

10.21203/rs.3.rs-193757/v1 ◽

2021 ◽

Author(s):

Chao Lu ◽

Fansheng Chen ◽

Xiaofeng Su ◽

Dan Zeng

Keyword(s):

Deep Learning ◽

Spatial Information ◽

Feature Fusion ◽

Infrared Image ◽

Semantic Segmentation ◽

Target Object ◽

Infrared Images ◽

Feature Maps ◽

Multi Scale ◽

Visible Images

Abstract Infrared technology is a widely used in precision guidance and mine detection since it can capture the heat radiated outward from the target object. We use infrared (IR) thermography to get the infrared image of the buried obje cts. Compared to the visible images, infrared images present poor resolution, low contrast, and fuzzy visual effect, which make it difficult to segment the target object, specifically in the complex backgrounds. In this condition, traditional segmentation methods cannot perform well in infrared images since they are easily disturbed by the noise and non-target objects in the images. With the advance of deep convolutional neural network (CNN), the deep learning-based methods have made significant improvements in semantic segmentation task. However, few of them research Infrared image semantic segmentation, which is a more challenging scenario compared to visible images. Moreover, the lack of an Infrared image dataset is also a problem for current methods based on deep learning. We raise a multi-scale attentional feature fusion (MS-AFF) module for infrared image semantic segmentation to solve this problem. Precisely, we integrate a series of feature maps from different levels by an atrous spatial pyramid structure. In this way, the model can obtain rich representation ability on the infrared images. Besides, a global spatial information attention module is employed to let the model focus on the target region and reduce disturbance in infrared images' background. In addition, we propose an infrared segmentation dataset based on the infrared thermal imaging system. Extensive experiments conducted in the infrared image segmentation dataset show the superiority of our method.

Download Full-text

HA-MPPNet: Height Aware-Multi Path Parallel Network for High Spatial Resolution Remote Sensing Image Semantic Seg-Mentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100672 ◽

2021 ◽

Vol 10 (10) ◽

pp. 672

Author(s):

Suting Chen ◽

Chaoqun Wu ◽

Mithun Mukherjee ◽

Yujie Zheng

Keyword(s):

Remote Sensing ◽

Spatial Resolution ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Surface Model ◽

Semantic Features ◽

Low Level ◽

Parallel Network

Semantic segmentation of remote sensing images (RSI) plays a significant role in urban management and land cover classification. Due to the richer spatial information in the RSI, existing convolutional neural network (CNN)-based methods cannot segment images accurately and lose some edge information of objects. In addition, recent studies have shown that leveraging additional 3D geometric data with 2D appearance is beneficial to distinguish the pixels’ category. However, most of them require height maps as additional inputs, which severely limits their applications. To alleviate the above issues, we propose a height aware-multi path parallel network (HA-MPPNet). Our proposed MPPNet first obtains multi-level semantic features while maintaining the spatial resolution in each path for preserving detailed image information. Afterward, gated high-low level feature fusion is utilized to complement the lack of low-level semantics. Then, we designed the height feature decode branch to learn the height features under the supervision of digital surface model (DSM) images and used the learned embeddings to improve semantic context by height feature guide propagation. Note that our module does not need a DSM image as additional input after training and is end-to-end. Our method outperformed other state-of-the-art methods for semantic segmentation on publicly available remote sensing image datasets.

Download Full-text

SPMF-Net: Weakly Supervised Building Segmentation by Combining Superpixel Pooling and Multi-Scale Feature Fusion

Remote Sensing ◽

10.3390/rs12061049 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1049 ◽

Cited By ~ 2

Author(s):

Jie Chen ◽

Fen He ◽

Yi Zhang ◽

Geng Sun ◽

Min Deng

Keyword(s):

Feature Fusion ◽

Semantic Segmentation ◽

Building Detection ◽

Segmentation Method ◽

Scale Feature ◽

Multi Scale ◽

Semantic Labeling ◽

Supervised Methods ◽

Boundary Information ◽

Weakly Supervised

The lack of pixel-level labeling limits the practicality of deep learning-based building semantic segmentation. Weakly supervised semantic segmentation based on image-level labeling results in incomplete object regions and missing boundary information. This paper proposes a weakly supervised semantic segmentation method for building detection. The proposed method takes the image-level label as supervision information in a classification network that combines superpixel pooling and multi-scale feature fusion structures. The main advantage of the proposed strategy is its ability to improve the intactness and boundary accuracy of a detected building. Our method achieves impressive results on two 2D semantic labeling datasets, which outperform some competing weakly supervised methods and are close to the result of the fully supervised method.

Download Full-text

Multi-Feature Manifold Discriminant Analysis for Hyperspectral Image Classification

Remote Sensing ◽

10.3390/rs11060651 ◽

2019 ◽

Vol 11 (6) ◽

pp. 651 ◽

Cited By ~ 10

Author(s):

Hong Huang ◽

Zhengying Li ◽

Yinsong Pan

Keyword(s):

Discriminant Analysis ◽

Spatial Information ◽

Hyperspectral Image ◽

Feature Fusion ◽

Local Binary Patterns ◽

Hyperspectral Data ◽

Spectral Features ◽

Spatial Features ◽

Combined Features ◽

Low Dimensional

Hyperspectral image (HSI) provides both spatial structure and spectral information for classification, but many traditional methods simply concatenate spatial features and spectral features together that usually lead to the curse-of-dimensionality and unbalanced representation of different features. To address this issue, a new dimensionality reduction (DR) method, termed multi-feature manifold discriminant analysis (MFMDA), was proposed in this paper. At first, MFMDA explores local binary patterns (LBP) operator to extract textural features for encoding the spatial information in HSI. Then, under graph embedding framework, the intrinsic and penalty graphs of LBP and spectral features are constructed to explore the discriminant manifold structure in both spatial and spectral domains, respectively. After that, a new spatial-spectral DR model for multi-feature fusion is built to extract discriminant spatial-spectral combined features, and it not only preserves the similarity relationship between spectral features and LBP features but also possesses strong discriminating ability in the low-dimensional embedding space. Experiments on Indian Pines, Heihe and Pavia University (PaviaU) hyperspectral data sets demonstrate that the proposed MFMDA method performs significantly better than some state-of-the-art methods using only single feature or simply stacking spectral features and spatial features together, and the classification accuracies of it can reach 95.43%, 97.19% and 96.60%, respectively.

Download Full-text

Implementation of a Lightweight Semantic Segmentation Algorithm in Road Obstacle Detection

Sensors ◽

10.3390/s20247089 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7089

Author(s):

Bushi Liu ◽

Yongbo Lv ◽

Yang Gu ◽

Wanjun Lv

Keyword(s):

Real Time ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Spatial Location ◽

Autonomous Driving ◽

Obstacle Detection ◽

Depth Information ◽

Long Time ◽

Deep Learning Network

Due to deep learning’s accurate cognition of the street environment, the convolutional neural network has achieved dramatic development in the application of street scenes. Considering the needs of autonomous driving and assisted driving, in a general way, computer vision technology is used to find obstacles to avoid collisions, which has made semantic segmentation a research priority in recent years. However, semantic segmentation has been constantly facing new challenges for quite a long time. Complex network depth information, large datasets, real-time requirements, etc., are typical problems that need to be solved urgently in the realization of autonomous driving technology. In order to address these problems, we propose an improved lightweight real-time semantic segmentation network, which is based on an efficient image cascading network (ICNet) architecture, using multi-scale branches and a cascaded feature fusion unit to extract rich multi-level features. In this paper, a spatial information network is designed to transmit more prior knowledge of spatial location and edge information. During the course of the training phase, we append an external loss function to enhance the learning process of the deep learning network system as well. This lightweight network can quickly perceive obstacles and detect roads in the drivable area from images to satisfy autonomous driving characteristics. The proposed model shows substantial performance on the Cityscapes dataset. With the premise of ensuring real-time performance, several sets of experimental comparisons illustrate that SP-ICNet enhances the accuracy of road obstacle detection and provides nearly ideal prediction outputs. Compared to the current popular semantic segmentation network, this study also demonstrates the effectiveness of our lightweight network for road obstacle detection in autonomous driving.

Download Full-text

Densely Connected Pyramidal Dilated Convolutional Network for Hyperspectral Image Classification

Remote Sensing ◽

10.3390/rs13173396 ◽

2021 ◽

Vol 13 (17) ◽

pp. 3396

Author(s):

Feng Zhao ◽

Junjie Zhang ◽

Zhe Meng ◽

Hanqiang Liu

Keyword(s):

Receptive Field ◽

Spatial Information ◽

Hyperspectral Image ◽

Feature Fusion ◽

Receptive Fields ◽

Classification Performance ◽

Convolutional Network ◽

Dilated Convolution ◽

Spatial Features ◽

Good Classification Performance

Recently, with the extensive application of deep learning techniques in the hyperspectral image (HSI) field, particularly convolutional neural network (CNN), the research of HSI classification has stepped into a new stage. To avoid the problem that the receptive field of naive convolution is small, the dilated convolution is introduced into the field of HSI classification. However, the dilated convolution usually generates blind spots in the receptive field, resulting in discontinuous spatial information obtained. In order to solve the above problem, a densely connected pyramidal dilated convolutional network (PDCNet) is proposed in this paper. Firstly, a pyramidal dilated convolutional (PDC) layer integrates different numbers of sub-dilated convolutional layers is proposed, where the dilated factor of the sub-dilated convolution increases exponentially, achieving multi-sacle receptive fields. Secondly, the number of sub-dilated convolutional layers increases in a pyramidal pattern with the depth of the network, thereby capturing more comprehensive hyperspectral information in the receptive field. Furthermore, a feature fusion mechanism combining pixel-by-pixel addition and channel stacking is adopted to extract more abstract spectral–spatial features. Finally, in order to reuse the features of the previous layers more effectively, dense connections are applied in densely pyramidal dilated convolutional (DPDC) blocks. Experiments on three well-known HSI datasets indicate that PDCNet proposed in this paper has good classification performance compared with other popular models.

Download Full-text

Lightweight Multilevel Feature Fusion Network for Hyperspectral Image Classification

Remote Sensing ◽

10.3390/rs14010079 ◽

2021 ◽

Vol 14 (1) ◽

pp. 79

Author(s):

Miaomiao Liang ◽

Huai Wang ◽

Xiangchun Yu ◽

Zhe Meng ◽

Jianbing Yi ◽

...

Keyword(s):

Classification Accuracy ◽

Network Architecture ◽

Spatial Information ◽

Hyperspectral Image ◽

Feature Fusion ◽

Spatial Perception ◽

Texture Features ◽

Data Set ◽

Spatial Feature ◽

Central Target

Hyperspectral images (HSIs), acquired as a 3D data set, contain spectral and spatial information that is important for ground–object recognition. A 3D convolutional neural network (3DCNN) could therefore be more suitable than a 2D one for extracting multiscale neighborhood information in the spectral and spatial domains simultaneously, if it is not restrained by mass parameters and computation cost. In this paper, we propose a novel lightweight multilevel feature fusion network (LMFN) that can achieve satisfactory HSI classification with fewer parameters and a lower computational burden. The LMFN decouples spectral–spatial feature extraction into two modules: point-wise 3D convolution to learn correlations between adjacent bands with no spatial perception, and depth-wise convolution to obtain local texture features while the spectral receptive field remains unchanged. Then, a target-guided fusion mechanism (TFM) is introduced to achieve multilevel spectral–spatial feature fusion between the two modules. More specifically, multiscale spectral features are endowed with spatial long-range dependency, which is quantified by central target pixel-guided similarity measurement. Subsequently, the results obtained from shallow to deep layers are added, respectively, to the spatial modules, in an orderly manner. The TFM block can enhance adjacent spectral correction and focus on pixels that actively boost the target classification accuracy, while performing multiscale feature fusion. Experimental results across three benchmark HSI data sets indicate that our proposed LMFN has competitive advantages, in terms of both classification accuracy and lightweight deep network architecture engineering. More importantly, compared to state-of-the-art methods, the LMFN presents better robustness and generalization.

Download Full-text