scholarly journals Semantic Matching Based on Semantic Segmentation and Neighborhood Consensus

2021 ◽  
Vol 11 (10) ◽  
pp. 4648
Author(s):  
Huaiyuan Xu ◽  
Xiaodong Chen ◽  
Huaiyu Cai ◽  
Yi Wang ◽  
Haitao Liang ◽  
...  

Establishing dense correspondences across semantically similar images is a challenging task, due to the large intra-class variation caused by the unconstrained setting of images, which is prone to cause matching errors. To suppress potential matching ambiguity, NCNet explores the neighborhood consensus pattern in the 4D space of all possible correspondences, which is based on the assumption that the correspondence is continuous in space. We retain the neighborhood consensus constraint, while introducing semantic segmentation information into the features, which makes them more distinguishable and reduces matching ambiguity from a feature perspective. Specifically, we combine the semantic segmentation network to extract semantic features and the 4D convolution to explore 4D-space context consistency. Experiments demonstrate that our algorithm has good semantic matching performances and semantic segmentation information can improve semantic matching accuracy.

2020 ◽  
Vol 34 (07) ◽  
pp. 11402-11409
Author(s):  
Siqi Li ◽  
Changqing Zou ◽  
Yipeng Li ◽  
Xibin Zhao ◽  
Yue Gao

This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.


2019 ◽  
Vol 220 (1) ◽  
pp. 323-334
Author(s):  
Jing Zheng ◽  
Shuaishuai Shen ◽  
Tianqi Jiang ◽  
Weiqiang Zhu

SUMMARY It is essential to pick P-wave and S-wave arrival times rapidly and accurately for the microseismic monitoring systems. Meanwhile, it is not easy to identify the arrivals at a true phase automatically using traditional picking method. This is one of the reasons that many researchers are trying to introduce deep neural networks to solve these problems. Convolutional neural networks (CNNs) are very attractive for designing automatic phase pickers especially after introducing the fundamental network structure from semantic segmentation field, which can give the probability outputs for every labelled phase at every sample in the recordings. The typical segmentation architecture consists of two main parts: (1) an encoder part trained to extracting coarse semantic features; (2) a decoder part responsible not only for recovering the input resolution at the output but also for obtaining sparse representation of the objects. The fundamental segmentation structure performs well; however, the influence of the parameters in the structure on the pickers has not been investigated. It means that the structure design just depends on experience and tests. In this paper, we solve two main questions to give some guidance on network design. First, we show what sparse features will learn from the three-component microseismic recordings using CNNs. Second, the influence of two key parameters in the network on pickers, namely, the depth of decoder and activation functions, is analysed. Increasing the number of levels for a certain layer in the decoder will increase the burden of demand on trainable parameters, but it is beneficial to the accuracy of the model. Reasonable depth of the decoder can balance prediction accuracy and the demand of labelled data, which is important for microseismic systems because manual labelling process will decrease the real-time performance in monitoring tasks. Standard rectified linear unit (ReLU) and leaky rectified linear unit (Leaky ReLU) with different negative slopes are compared for the analysis. Leaky ReLU with a small negative slope can improve the performance of a given model than ReLU activation function by keeping some information about the negative parts.


2020 ◽  
Vol 12 (5) ◽  
pp. 872 ◽  
Author(s):  
Ronghua Shang ◽  
Jiyu Zhang ◽  
Licheng Jiao ◽  
Yangyang Li ◽  
Naresh Marturi ◽  
...  

Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high- and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.


2020 ◽  
Vol 34 (07) ◽  
pp. 12951-12958 ◽  
Author(s):  
Lin Zhao ◽  
Wenbing Tao

In this paper, we propose a novel joint instance and semantic segmentation approach, which is called JSNet, in order to address the instance and semantic segmentation of 3D point clouds simultaneously. Firstly, we build an effective backbone network to extract robust features from the raw point clouds. Secondly, to obtain more discriminative features, a point cloud feature fusion module is proposed to fuse the different layer features of the backbone network. Furthermore, a joint instance semantic segmentation module is developed to transform semantic features into instance embedding space, and then the transformed features are further fused with instance features to facilitate instance segmentation. Meanwhile, this module also aggregates instance features into semantic feature space to promote semantic segmentation. Finally, the instance predictions are generated by applying a simple mean-shift clustering on instance embeddings. As a result, we evaluate the proposed JSNet on a large-scale 3D indoor point cloud dataset S3DIS and a part dataset ShapeNet, and compare it with existing approaches. Experimental results demonstrate our approach outperforms the state-of-the-art method in 3D instance segmentation with a significant improvement in 3D semantic prediction and our method is also beneficial for part segmentation. The source code for this work is available at https://github.com/dlinzhao/JSNet.


2021 ◽  
Vol 13 (4) ◽  
pp. 633
Author(s):  
Xiuwei Zhang ◽  
Yang Zhou ◽  
Jiaojiao Jin ◽  
Yafei Wang ◽  
Minhao Fan ◽  
...  

Accurate ice segmentation is one of the most crucial techniques for intelligent ice monitoring. Compared with ice segmentation, it can provide more information for ice situation analysis, change trend prediction, and so on. Therefore, the study of ice segmentation has important practical significance. In this study, we focused on fine-grained river ice segmentation using unmanned aerial vehicle (UAV) images. This has the following difficulties: (1) The scale of river ice varies greatly in different images and even in the same image; (2) the same kind of river ice differs greatly in color, shape, texture, size, and so on; and (3) the appearances of different kinds of river ice sometimes appear similar due to the complex formation and change procedure. Therefore, to perform this study, the NWPU_YRCC2 dataset was built, in which all UAV images were collected in the Ningxia–Inner Mongolia reach of the Yellow River. Then, a novel semantic segmentation method based on deep convolution neural network, named ICENETv2, is proposed. To achieve multiscale accurate prediction, we design a multilevel features fusion framework, in which multi-scale high-level semantic features and lower-level finer features are effectively fused. Additionally, a dual attention module is adopted to highlight distinguishable characteristics, and a learnable up-sampling strategy is further used to improve the segmentation accuracy of the details. Experiments show that ICENETv2 achieves the state-of-the-art on the NWPU_YRCC2 dataset. Finally, our ICENETv2 is also applied to solve a realistic problem, calculating drift ice cover density, which is one of the most important factors to predict the freeze-up data of the river. The results demonstrate that the performance of ICENETv2 meets the actual application demand.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Lin Wang ◽  
Xingfu Wang ◽  
Ammar Hawbani ◽  
Yan Xiong ◽  
Xu Zhang

With the development of science and technology, the middle volume and neural network in the semantic image segmentation of the codec show good development prospects. Its advantage is that it can extract richer semantic features, but this will cause high costs. In order to solve this problem, this article mainly introduces the codec based on a separable convolutional neural network for semantic image segmentation. This article proposes a codec based on a separable convolutional neural network for semantic image segmentation research methods, including the traditional convolutional neural network hierarchy into a separable convolutional neural network, which can reduce the cost of image data segmentation and improve processing efficiency. Moreover, this article builds a separable convolutional neural network codec structure and designs a semantic segmentation process, so that the codec based on a separable convolutional neural network is used for semantic image segmentation research experiments. The experimental results show that the average improvement of the dataset by the improved codec is 0.01, which proves the effectiveness of the improved SegProNet. The smaller the number of training set samples, the more obvious the performance improvement.


2021 ◽  
Vol 10 (10) ◽  
pp. 672
Author(s):  
Suting Chen ◽  
Chaoqun Wu ◽  
Mithun Mukherjee ◽  
Yujie Zheng

Semantic segmentation of remote sensing images (RSI) plays a significant role in urban management and land cover classification. Due to the richer spatial information in the RSI, existing convolutional neural network (CNN)-based methods cannot segment images accurately and lose some edge information of objects. In addition, recent studies have shown that leveraging additional 3D geometric data with 2D appearance is beneficial to distinguish the pixels’ category. However, most of them require height maps as additional inputs, which severely limits their applications. To alleviate the above issues, we propose a height aware-multi path parallel network (HA-MPPNet). Our proposed MPPNet first obtains multi-level semantic features while maintaining the spatial resolution in each path for preserving detailed image information. Afterward, gated high-low level feature fusion is utilized to complement the lack of low-level semantics. Then, we designed the height feature decode branch to learn the height features under the supervision of digital surface model (DSM) images and used the learned embeddings to improve semantic context by height feature guide propagation. Note that our module does not need a DSM image as additional input after training and is end-to-end. Our method outperformed other state-of-the-art methods for semantic segmentation on publicly available remote sensing image datasets.


2022 ◽  
Author(s):  
Yuehua Zhao ◽  
Ma Jie ◽  
Chong Nannan ◽  
Wen Junjie

Abstract Real time large scale point cloud segmentation is an important but challenging task for practical application like autonomous driving. Existing real time methods have achieved acceptance performance by aggregating local information. However, most of them only exploit local spatial information or local semantic information dependently, few considering the complementarity of both. In this paper, we propose a model named Spatial-Semantic Incorporation Network (SSI-Net) for real time large scale point cloud segmentation. A Spatial-Semantic Cross-correction (SSC) module is introduced in SSI-Net as a basic unit. High quality contextual features can be learned through SSC by correct and update semantic features using spatial cues, and vice verse. Adopting the plug-and-play SSC module, we design SSI-Net as an encoder-decoder architecture. To ensure efficiency, it also adopts a random sample based hierarchical network structure. Extensive experiments on several prevalent datasets demonstrate that our method can achieve state-of-the-art performance.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1873
Author(s):  
Xiao Xiao ◽  
Fan Yang ◽  
Amir Sadovnik

A blur detection problem which aims to separate the blurred and clear regions of an image is widely used in many important computer vision tasks such object detection, semantic segmentation, and face recognition, attracting increasing attention from researchers and industry in recent years. To improve the quality of the image separation, many researchers have spent enormous efforts on extracting features from various scales of images. However, the matter of how to extract blur features and fuse these features synchronously is still a big challenge. In this paper, we regard blur detection as an image segmentation problem. Inspired by the success of the U-net architecture for image segmentation, we propose a multi-scale dilated convolutional neural network called MSDU-net. In this model, we design a group of multi-scale feature extractors with dilated convolutions to extract textual information at different scales at the same time. The U-shape architecture of the MSDU-net can fuse the different-scale texture features and generated semantic features to support the image segmentation task. We conduct extensive experiments on two classic public benchmark datasets and show that the MSDU-net outperforms other state-of-the-art blur detection approaches.


2020 ◽  
Vol 2020 ◽  
pp. 1-14
Author(s):  
Rongsheng Dong ◽  
Lulu Bai ◽  
Fengying Li

Boundary pixel blur and category imbalance are common problems that occur during semantic segmentation of urban remote sensing images. Inspired by DenseU-Net, this paper proposes a new end-to-end network—SiameseDenseU-Net. First, the network simultaneously uses both true orthophoto (TOP) images and their corresponding normalized digital surface model (nDSM) as the input of the network structure. The deep image features are extracted in parallel by downsampling blocks. Information such as shallow textures and high-level abstract semantic features are fused throughout the connected channels. The features extracted by the two parallel processing chains are then fused. Finally, a softmax layer is used to perform prediction to generate dense label maps. Experiments on the Vaihingen dataset show that SiameseDenseU-Net improves the F1-score by 8.2% and 7.63% compared with the Hourglass-ShapeNetwork (HSN) model and with the U-Net model. Regarding the boundary pixels, when using the same focus loss function based on median frequency balance weighting, compared with the original DenseU-Net, the small-target “car” category F1-score of SiameseDenseU-Net improved by 0.92%. The overall accuracy and the average F1-score also improved to varying degrees. The proposed SiameseDenseU-Net is better at identifying small-target categories and boundary pixels, and it is numerically and visually superior to the contrast model.


Sign in / Sign up

Export Citation Format

Share Document