Semantic Matching Based on Semantic Segmentation and Neighborhood Consensus

Huaiyuan Xu; Xiaodong Chen; Huaiyu Cai; Yi Wang; Haitao Liang; Haotian Li

doi:10.3390/app11104648

Semantic Matching Based on Semantic Segmentation and Neighborhood Consensus

Applied Sciences ◽

10.3390/app11104648 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4648

Author(s):

Huaiyuan Xu ◽

Xiaodong Chen ◽

Huaiyu Cai ◽

Yi Wang ◽

Haitao Liang ◽

...

Keyword(s):

Semantic Segmentation ◽

Semantic Features ◽

Semantic Matching ◽

Matching Accuracy ◽

Consensus Pattern ◽

Similar Images

Establishing dense correspondences across semantically similar images is a challenging task, due to the large intra-class variation caused by the unconstrained setting of images, which is prone to cause matching errors. To suppress potential matching ambiguity, NCNet explores the neighborhood consensus pattern in the 4D space of all possible correspondences, which is based on the assumption that the correspondence is continuous in space. We retain the neighborhood consensus constraint, while introducing semantic segmentation information into the features, which makes them more distinguishable and reduces matching ambiguity from a feature perspective. Specifically, we combine the semantic segmentation network to extract semantic features and the 4D convolution to explore 4D-space context consistency. Experiments demonstrate that our algorithm has good semantic matching performances and semantic segmentation information can improve semantic matching accuracy.

Download Full-text

Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6803 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11402-11409

Author(s):

Siqi Li ◽

Changqing Zou ◽

Yipeng Li ◽

Xibin Zhao ◽

Yue Gao

Keyword(s):

State Of The Art ◽

Semantic Segmentation ◽

Spatial Dimension ◽

Semantic Features ◽

Convolutional Network ◽

The Real ◽

Single View ◽

Depth Cues ◽

Semantic Scene ◽

3D Scene

This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.

Download Full-text

Deep neural networks design and analysis for automatic phase pickers from three-component microseismic recordings

Geophysical Journal International ◽

10.1093/gji/ggz441 ◽

2019 ◽

Vol 220 (1) ◽

pp. 323-334

Author(s):

Jing Zheng ◽

Shuaishuai Shen ◽

Tianqi Jiang ◽

Weiqiang Zhu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Semantic Segmentation ◽

Structure Design ◽

P Wave ◽

Negative Slope ◽

Semantic Features ◽

S Wave ◽

Automatic Phase ◽

Rectified Linear Unit

SUMMARY It is essential to pick P-wave and S-wave arrival times rapidly and accurately for the microseismic monitoring systems. Meanwhile, it is not easy to identify the arrivals at a true phase automatically using traditional picking method. This is one of the reasons that many researchers are trying to introduce deep neural networks to solve these problems. Convolutional neural networks (CNNs) are very attractive for designing automatic phase pickers especially after introducing the fundamental network structure from semantic segmentation field, which can give the probability outputs for every labelled phase at every sample in the recordings. The typical segmentation architecture consists of two main parts: (1) an encoder part trained to extracting coarse semantic features; (2) a decoder part responsible not only for recovering the input resolution at the output but also for obtaining sparse representation of the objects. The fundamental segmentation structure performs well; however, the influence of the parameters in the structure on the pickers has not been investigated. It means that the structure design just depends on experience and tests. In this paper, we solve two main questions to give some guidance on network design. First, we show what sparse features will learn from the three-component microseismic recordings using CNNs. Second, the influence of two key parameters in the network on pickers, namely, the depth of decoder and activation functions, is analysed. Increasing the number of levels for a certain layer in the decoder will increase the burden of demand on trainable parameters, but it is beneficial to the accuracy of the model. Reasonable depth of the decoder can balance prediction accuracy and the demand of labelled data, which is important for microseismic systems because manual labelling process will decrease the real-time performance in monitoring tasks. Standard rectified linear unit (ReLU) and leaky rectified linear unit (Leaky ReLU) with different negative slopes are compared for the analysis. Leaky ReLU with a small negative slope can improve the performance of a given model than ReLU activation function by keeping some information about the negative parts.

Download Full-text

Multi-scale Adaptive Feature Fusion Network for Semantic Segmentation in Remote Sensing Images

Remote Sensing ◽

10.3390/rs12050872 ◽

2020 ◽

Vol 12 (5) ◽

pp. 872 ◽

Cited By ~ 3

Author(s):

Ronghua Shang ◽

Jiyu Zhang ◽

Licheng Jiao ◽

Yangyang Li ◽

Naresh Marturi ◽

...

Keyword(s):

Remote Sensing ◽

Multiple Scales ◽

Feature Fusion ◽

Semantic Segmentation ◽

Semantic Features ◽

Remote Sensing Images ◽

Global Features ◽

Global Average ◽

Multi Scale ◽

Context Extraction

Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high- and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.

Download Full-text

JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6994 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12951-12958 ◽

Cited By ~ 3

Author(s):

Lin Zhao ◽

Wenbing Tao

Keyword(s):

Point Cloud ◽

Large Scale ◽

Feature Fusion ◽

Mean Shift ◽

Semantic Segmentation ◽

Point Clouds ◽

Semantic Features ◽

Backbone Network ◽

3D Point Clouds ◽

Instance Segmentation

In this paper, we propose a novel joint instance and semantic segmentation approach, which is called JSNet, in order to address the instance and semantic segmentation of 3D point clouds simultaneously. Firstly, we build an effective backbone network to extract robust features from the raw point clouds. Secondly, to obtain more discriminative features, a point cloud feature fusion module is proposed to fuse the different layer features of the backbone network. Furthermore, a joint instance semantic segmentation module is developed to transform semantic features into instance embedding space, and then the transformed features are further fused with instance features to facilitate instance segmentation. Meanwhile, this module also aggregates instance features into semantic feature space to promote semantic segmentation. Finally, the instance predictions are generated by applying a simple mean-shift clustering on instance embeddings. As a result, we evaluate the proposed JSNet on a large-scale 3D indoor point cloud dataset S3DIS and a part dataset ShapeNet, and compare it with existing approaches. Experimental results demonstrate our approach outperforms the state-of-the-art method in 3D instance segmentation with a significant improvement in 3D semantic prediction and our method is also beneficial for part segmentation. The source code for this work is available at https://github.com/dlinzhao/JSNet.

Download Full-text

ICENETv2: A Fine-Grained River Ice Semantic Segmentation Network Based on UAV Images

Remote Sensing ◽

10.3390/rs13040633 ◽

2021 ◽

Vol 13 (4) ◽

pp. 633

Author(s):

Xiuwei Zhang ◽

Yang Zhou ◽

Jiaojiao Jin ◽

Yafei Wang ◽

Minhao Fan ◽

...

Keyword(s):

Yellow River ◽

Sampling Strategy ◽

Semantic Segmentation ◽

Practical Significance ◽

River Ice ◽

Semantic Features ◽

Trend Prediction ◽

Fine Grained ◽

Actual Application ◽

Uav Images

Accurate ice segmentation is one of the most crucial techniques for intelligent ice monitoring. Compared with ice segmentation, it can provide more information for ice situation analysis, change trend prediction, and so on. Therefore, the study of ice segmentation has important practical significance. In this study, we focused on fine-grained river ice segmentation using unmanned aerial vehicle (UAV) images. This has the following difficulties: (1) The scale of river ice varies greatly in different images and even in the same image; (2) the same kind of river ice differs greatly in color, shape, texture, size, and so on; and (3) the appearances of different kinds of river ice sometimes appear similar due to the complex formation and change procedure. Therefore, to perform this study, the NWPU_YRCC2 dataset was built, in which all UAV images were collected in the Ningxia–Inner Mongolia reach of the Yellow River. Then, a novel semantic segmentation method based on deep convolution neural network, named ICENETv2, is proposed. To achieve multiscale accurate prediction, we design a multilevel features fusion framework, in which multi-scale high-level semantic features and lower-level finer features are effectively fused. Additionally, a dual attention module is adopted to highlight distinguishable characteristics, and a learnable up-sampling strategy is further used to improve the segmentation accuracy of the details. Experiments show that ICENETv2 achieves the state-of-the-art on the NWPU_YRCC2 dataset. Finally, our ICENETv2 is also applied to solve a realistic problem, calculating drift ice cover density, which is one of the most important factors to predict the freeze-up data of the river. The results demonstrate that the performance of ICENETv2 meets the actual application demand.

Download Full-text

Rethinking Separable Convolutional Encoders for End-to-End Semantic Image Segmentation

Mathematical Problems in Engineering ◽

10.1155/2021/5566691 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Lin Wang ◽

Xingfu Wang ◽

Ammar Hawbani ◽

Yan Xiong ◽

Xu Zhang

Keyword(s):

Neural Network ◽

Image Segmentation ◽

Convolutional Neural Network ◽

Image Data ◽

Semantic Segmentation ◽

Semantic Features ◽

Processing Efficiency ◽

Semantic Image Segmentation ◽

Average Improvement ◽

Convolutional Encoders

With the development of science and technology, the middle volume and neural network in the semantic image segmentation of the codec show good development prospects. Its advantage is that it can extract richer semantic features, but this will cause high costs. In order to solve this problem, this article mainly introduces the codec based on a separable convolutional neural network for semantic image segmentation. This article proposes a codec based on a separable convolutional neural network for semantic image segmentation research methods, including the traditional convolutional neural network hierarchy into a separable convolutional neural network, which can reduce the cost of image data segmentation and improve processing efficiency. Moreover, this article builds a separable convolutional neural network codec structure and designs a semantic segmentation process, so that the codec based on a separable convolutional neural network is used for semantic image segmentation research experiments. The experimental results show that the average improvement of the dataset by the improved codec is 0.01, which proves the effectiveness of the improved SegProNet. The smaller the number of training set samples, the more obvious the performance improvement.

Download Full-text

HA-MPPNet: Height Aware-Multi Path Parallel Network for High Spatial Resolution Remote Sensing Image Semantic Seg-Mentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100672 ◽

2021 ◽

Vol 10 (10) ◽

pp. 672

Author(s):

Suting Chen ◽

Chaoqun Wu ◽

Mithun Mukherjee ◽

Yujie Zheng

Keyword(s):

Remote Sensing ◽

Spatial Resolution ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Surface Model ◽

Semantic Features ◽

Low Level ◽

Parallel Network

Semantic segmentation of remote sensing images (RSI) plays a significant role in urban management and land cover classification. Due to the richer spatial information in the RSI, existing convolutional neural network (CNN)-based methods cannot segment images accurately and lose some edge information of objects. In addition, recent studies have shown that leveraging additional 3D geometric data with 2D appearance is beneficial to distinguish the pixels’ category. However, most of them require height maps as additional inputs, which severely limits their applications. To alleviate the above issues, we propose a height aware-multi path parallel network (HA-MPPNet). Our proposed MPPNet first obtains multi-level semantic features while maintaining the spatial resolution in each path for preserving detailed image information. Afterward, gated high-low level feature fusion is utilized to complement the lack of low-level semantics. Then, we designed the height feature decode branch to learn the height features under the supervision of digital surface model (DSM) images and used the learned embeddings to improve semantic context by height feature guide propagation. Note that our module does not need a DSM image as additional input after training and is end-to-end. Our method outperformed other state-of-the-art methods for semantic segmentation on publicly available remote sensing image datasets.

Download Full-text

Point Cloud Semantic Segmentation with Cross-Correction Features

10.21203/rs.3.rs-1218117/v1 ◽

2022 ◽

Author(s):

Yuehua Zhao ◽

Ma Jie ◽

Chong Nannan ◽

Wen Junjie

Keyword(s):

Real Time ◽

Point Cloud ◽

Large Scale ◽

Spatial Information ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Basic Unit ◽

Semantic Features ◽

Point Cloud Segmentation ◽

Scale Point

Abstract Real time large scale point cloud segmentation is an important but challenging task for practical application like autonomous driving. Existing real time methods have achieved acceptance performance by aggregating local information. However, most of them only exploit local spatial information or local semantic information dependently, few considering the complementarity of both. In this paper, we propose a model named Spatial-Semantic Incorporation Network (SSI-Net) for real time large scale point cloud segmentation. A Spatial-Semantic Cross-correction (SSC) module is introduced in SSI-Net as a basic unit. High quality contextual features can be learned through SSC by correct and update semantic features using spatial cues, and vice verse. Adopting the plug-and-play SSC module, we design SSI-Net as an encoder-decoder architecture. To ensure efficiency, it also adopts a random sample based hierarchical network structure. Extensive experiments on several prevalent datasets demonstrate that our method can achieve state-of-the-art performance.

Download Full-text

MSDU-Net: A Multi-Scale Dilated U-Net for Blur Detection

Sensors ◽

10.3390/s21051873 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1873

Author(s):

Xiao Xiao ◽

Fan Yang ◽

Amir Sadovnik

Keyword(s):

Image Segmentation ◽

Texture Features ◽

Semantic Segmentation ◽

Semantic Features ◽

Multi Scale ◽

Blur Detection ◽

Benchmark Datasets ◽

Image Separation ◽

Segmentation Task

A blur detection problem which aims to separate the blurred and clear regions of an image is widely used in many important computer vision tasks such object detection, semantic segmentation, and face recognition, attracting increasing attention from researchers and industry in recent years. To improve the quality of the image separation, many researchers have spent enormous efforts on extracting features from various scales of images. However, the matter of how to extract blur features and fuse these features synchronously is still a big challenge. In this paper, we regard blur detection as an image segmentation problem. Inspired by the success of the U-net architecture for image segmentation, we propose a multi-scale dilated convolutional neural network called MSDU-net. In this model, we design a group of multi-scale feature extractors with dilated convolutions to extract textual information at different scales at the same time. The U-shape architecture of the MSDU-net can fuse the different-scale texture features and generated semantic features to support the image segmentation task. We conduct extensive experiments on two classic public benchmark datasets and show that the MSDU-net outperforms other state-of-the-art blur detection approaches.

Download Full-text

SiameseDenseU-Net-based Semantic Segmentation of Urban Remote Sensing Images

Mathematical Problems in Engineering ◽

10.1155/2020/1515630 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Rongsheng Dong ◽

Lulu Bai ◽

Fengying Li

Keyword(s):

Remote Sensing ◽

Semantic Segmentation ◽

Image Features ◽

Surface Model ◽

Median Frequency ◽

Semantic Features ◽

Small Target ◽

Remote Sensing Images ◽

Contrast Model ◽

Urban Remote Sensing

Boundary pixel blur and category imbalance are common problems that occur during semantic segmentation of urban remote sensing images. Inspired by DenseU-Net, this paper proposes a new end-to-end network—SiameseDenseU-Net. First, the network simultaneously uses both true orthophoto (TOP) images and their corresponding normalized digital surface model (nDSM) as the input of the network structure. The deep image features are extracted in parallel by downsampling blocks. Information such as shallow textures and high-level abstract semantic features are fused throughout the connected channels. The features extracted by the two parallel processing chains are then fused. Finally, a softmax layer is used to perform prediction to generate dense label maps. Experiments on the Vaihingen dataset show that SiameseDenseU-Net improves the F1-score by 8.2% and 7.63% compared with the Hourglass-ShapeNetwork (HSN) model and with the U-Net model. Regarding the boundary pixels, when using the same focus loss function based on median frequency balance weighting, compared with the original DenseU-Net, the small-target “car” category F1-score of SiameseDenseU-Net improved by 0.92%. The overall accuracy and the average F1-score also improved to varying degrees. The proposed SiameseDenseU-Net is better at identifying small-target categories and boundary pixels, and it is numerically and visually superior to the contrast model.

Download Full-text