scholarly journals Dense RGB-D Semantic Mapping with Pixel-Voxel Neural Network

Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 3099 ◽  
Author(s):  
Cheng Zhao ◽  
Li Sun ◽  
Pulak Purkait ◽  
Tom Duckett ◽  
Rustam Stolkin

In this paper, a novel Pixel-Voxel network is proposed for dense 3D semantic mapping, which can perform dense 3D mapping while simultaneously recognizing and labelling the semantic category each point in the 3D map. In our approach, we fully leverage the advantages of different modalities. That is, the PixelNet can learn the high-level contextual information from 2D RGB images, and the VoxelNet can learn 3D geometrical shapes from the 3D point cloud. Unlike the existing architecture that fuses score maps from different modalities with equal weights, we propose a softmax weighted fusion stack that adaptively learns the varying contributions of PixelNet and VoxelNet and fuses the score maps according to their respective confidence levels. Our approach achieved competitive results on both the SUN RGB-D and NYU V2 benchmarks, while the runtime of the proposed system is boosted to around 13 Hz, enabling near-real-time performance using an i7 eight-cores PC with a single Titan X GPU.

2021 ◽  
Vol 11 (4) ◽  
pp. 1953
Author(s):  
Francisco Martín ◽  
Fernando González ◽  
José Miguel Guerrero ◽  
Manuel Fernández ◽  
Jonatan Ginés

The perception and identification of visual stimuli from the environment is a fundamental capacity of autonomous mobile robots. Current deep learning techniques make it possible to identify and segment objects of interest in an image. This paper presents a novel algorithm to segment the object’s space from a deep segmentation of an image taken by a 3D camera. The proposed approach solves the boundary pixel problem that appears when a direct mapping from segmented pixels to their correspondence in the point cloud is used. We validate our approach by comparing baseline approaches using real images taken by a 3D camera, showing that our method outperforms their results in terms of accuracy and reliability. As an application of the proposed algorithm, we present a semantic mapping approach for a mobile robot’s indoor environments.


2018 ◽  
Vol 36 (6) ◽  
pp. 1114-1134 ◽  
Author(s):  
Xiufeng Cheng ◽  
Jinqing Yang ◽  
Lixin Xia

PurposeThis paper aims to propose an extensible, service-oriented framework for context-aware data acquisition, description, interpretation and reasoning, which facilitates the development of mobile applications that provide a context-awareness service.Design/methodology/approachFirst, the authors propose the context data reasoning framework (CDRFM) for generating service-oriented contextual information. Then they used this framework to composite mobile sensor data into low-level contextual information. Finally, the authors exploited some high-level contextual information that can be inferred from the formatted low-level contextual information using particular inference rules.FindingsThe authors take “user behavior patterns” as an exemplary context information generation schema in their experimental study. The results reveal that the optimization of service can be guided by the implicit, high-level context information inside user behavior logs. They also prove the validity of the authors’ framework.Research limitations/implicationsFurther research will add more variety of sensor data. Furthermore, to validate the effectiveness of our framework, more reasoning rules need to be performed. Therefore, the authors may implement more algorithms in the framework to acquire more comprehensive context information.Practical implicationsCDRFM expands the context-awareness framework of previous research and unifies the procedures of acquiring, describing, modeling, reasoning and discovering implicit context information for mobile service providers.Social implicationsSupport the service-oriented context-awareness function in application design and related development in commercial mobile software industry.Originality/valueExtant researches on context awareness rarely considered the generation contextual information for service providers. The CDRFM can be used to generate valuable contextual information by implementing more reasoning rules.


2019 ◽  
Author(s):  
Lore Goetschalckx ◽  
Johan Wagemans

This is a preprint. Please find the published, peer reviewed version of the paper here: https://peerj.com/articles/8169/. Images differ in their memorability in consistent ways across observers. What makes an image memorable is not fully understood to date. Most of the current insight is in terms of high-level semantic aspects, related to the content. However, research still shows consistent differences within semantic categories, suggesting a role for factors at other levels of processing in the visual hierarchy. To aid investigations into this role as well as contributions to the understanding of image memorability more generally, we present MemCat. MemCat is a category-based image set, consisting of 10K images representing five broader, memorability-relevant categories (animal, food, landscape, sports, and vehicle) and further divided into subcategories (e.g., bear). They were sampled from existing source image sets that offer bounding box annotations or more detailed segmentation masks. We collected memorability scores for all 10K images, each score based on the responses of on average 99 participants in a repeat-detection memory task. Replicating previous research, the collected memorability scores show high levels of consistency across observers. Currently, MemCat is the second largest memorability image set and the largest offering a category-based structure. MemCat can be used to study the factors underlying the variability in image memorability, including the variability within semantic categories. In addition, it offers a new benchmark dataset for the automatic prediction of memorability scores (e.g., with convolutional neural networks). Finally, MemCat allows to study neural and behavioral correlates of memorability while controlling for semantic category.


Author(s):  
F. Politz ◽  
M. Sester

<p><strong>Abstract.</strong> Over the past years, the algorithms for dense image matching (DIM) to obtain point clouds from aerial images improved significantly. Consequently, DIM point clouds are now a good alternative to the established Airborne Laser Scanning (ALS) point clouds for remote sensing applications. In order to derive high-level applications such as digital terrain models or city models, each point within a point cloud must be assigned a class label. Usually, ALS and DIM are labelled with different classifiers due to their varying characteristics. In this work, we explore both point cloud types in a fully convolutional encoder-decoder network, which learns to classify ALS as well as DIM point clouds. As input, we project the point clouds onto a 2D image raster plane and calculate the minimal, average and maximal height values for each raster cell. The network then differentiates between the classes ground, non-ground, building and no data. We test our network in six training setups using only one point cloud type, both point clouds as well as several transfer-learning approaches. We quantitatively and qualitatively compare all results and discuss the advantages and disadvantages of all setups. The best network achieves an overall accuracy of 96<span class="thinspace"></span>% in an ALS and 83<span class="thinspace"></span>% in a DIM test set.</p>


2020 ◽  
Vol 11 (87) ◽  
Author(s):  
Natalia Glinka ◽  
◽  
Tetiana Zykova ◽  

The article is devoted to current problems of linguistic science - the study of mechanisms for implementing the manipulative function of a political text, which involves the study of features and potential of the means of expression and its effective impact on the mass consciousness. The study of political text as a complex and multi-vector phenomenon makes it possible to identify effective means of communicative influence on recipients, which is an important factor in the development of communication technologies and increase the manipulative function of political speeches. Today, the political text is the object of close attention and study of many scholars in various fields of knowledge, such as political science, economics, psychology, linguistics, as political communicative behavior is characterized by a set of language and speech means, including language units of expression. The expressiveness of a political text is an important semantic category that every experienced politician takes into account. Therefore, today there is a growing interest in the study of the communicative aspect of language, in the problems of interpretation of expressive, word-forming and syntactic means in the translated text. In the field of translation studies, the direction of modern linguistics is presented as a study of mechanisms for reproducing the potential of the means of expression in the original language by appropriate means in translation, which requires a comprehensive study of political texts in combination of semantic, expressive and pragmatic levels. There is a need to clarify both general theoretical knowledges and to study practical views on the reproduction of the communicative and pragmatic aspects of the political text functioning in the language of translation. Perfect and complete political texts translation in various genres, taking into account linguistic and cultural peculiarities, requires from the translator not only a high level of language proficiency, but also deep background knowledge, including information about the country of the native speaker. It is noted that the transfer of means of expression in the Ukrainian language is carried out with the involvement of various stylistic, lexical and grammatical transformations.


Sensors ◽  
2019 ◽  
Vol 19 (11) ◽  
pp. 2553 ◽  
Author(s):  
Jingwen Cui ◽  
Jianping Zhang ◽  
Guiling Sun ◽  
Bowen Zheng

Based on computer vision technology, this paper proposes a method for identifying and locating crops in order to successfully capture crops in the process of automatic crop picking. This method innovatively combines the YOLOv3 algorithm under the DarkNet framework with the point cloud image coordinate matching method, and can achieve the goal of this paper very well. Firstly, RGB (RGB is the color representing the three channels of red, green and blue) images and depth images are obtained by using the Kinect v2 depth camera. Secondly, the YOLOv3 algorithm is used to identify the various types of target crops in the RGB images, and the feature points of the target crops are determined. Finally, the 3D coordinates of the feature points are displayed on the point cloud images. Compared with other methods, this method of crop identification has high accuracy and small positioning error, which lays a good foundation for the subsequent harvesting of crops using mechanical arms. In summary, the method used in this paper can be considered effective.


Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6570
Author(s):  
Chang Sun ◽  
Yibo Ai ◽  
Sheng Wang ◽  
Weidong Zhang

Detecting and classifying real-life small traffic signs from large input images is difficult due to their occupying fewer pixels relative to larger targets. To address this challenge, we proposed a deep-learning-based model (Dense-RefineDet) that applies a single-shot, object-detection framework (RefineDet) to maintain a suitable accuracy–speed trade-off. We constructed a dense connection-related transfer-connection block to combine high-level feature layers with low-level feature layers to optimize the use of the higher layers to obtain additional contextual information. Additionally, we presented an anchor-design method to provide suitable anchors for detecting small traffic signs. Experiments using the Tsinghua-Tencent 100K dataset demonstrated that Dense-RefineDet achieved competitive accuracy at high-speed detection (0.13 s/frame) of small-, medium-, and large-scale traffic signs (recall: 84.3%, 95.2%, and 92.6%; precision: 83.9%, 95.6%, and 94.0%). Moreover, experiments using the Caltech pedestrian dataset indicated that the miss rate of Dense-RefineDet was 54.03% (pedestrian height > 20 pixels), which outperformed other state-of-the-art methods.


Sensors ◽  
2019 ◽  
Vol 19 (19) ◽  
pp. 4093 ◽  
Author(s):  
Jun Xu ◽  
Yanxin Ma ◽  
Songhua He ◽  
Jiahua Zhu

Three-dimensional (3D) object detection is an important research in 3D computer vision with significant applications in many fields, such as automatic driving, robotics, and human–computer interaction. However, the low precision is an urgent problem in the field of 3D object detection. To solve it, we present a framework for 3D object detection in point cloud. To be specific, a designed Backbone Network is used to make fusion of low-level features and high-level features, which makes full use of various information advantages. Moreover, the two-dimensional (2D) Generalized Intersection over Union is extended to 3D use as part of the loss function in our framework. Empirical experiments of Car, Cyclist, and Pedestrian detection have been conducted respectively on the KITTI benchmark. Experimental results with average precision (AP) have shown the effectiveness of the proposed network.


2019 ◽  
Vol 34 (5) ◽  
pp. 619-651
Author(s):  
Laura R. Johnson ◽  
Christopher F. Drescher ◽  
Sophia H. Assenga ◽  
Rachel J. Marsh

Street-connected adolescents in sub-Saharan Africa have been neglected in scholarly research. Extant literature is largely problem focused. This study describes strengths and assets among street-connected youth in Tanzania, using a participatory, mixed methods approach. Adolescents ( N = 38, 13-17 years) in a rehabilitation center for street youth in Northern Tanzania completed a Swahili version of the Developmental Assets Profile (DAP). They engaged in participatory activities designed to capture multiple perspectives and promote maximal engagement. A subsample of youth ( n = 8) took part in photovoice to elucidate contextual details. Although exploratory, we expected (a) participants would have lower scores on the external versus internal domain of the DAP; (b) qualitative methods would support the DAP and provide complementary, contextual information; and (c) participatory methods would be important for providing varied perspectives and engaging youth in the research process. Results revealed a moderately high level of assets, with strengths in constructive use of time and commitment to school. External assets were higher than internal assets; however, different assets were emphasized across different methods. Overall, results supported the DAP framework. The participatory approaches effectively engaged youth and illuminated the culture and context of their development.


Symmetry ◽  
2019 ◽  
Vol 12 (1) ◽  
pp. 28 ◽  
Author(s):  
Chao Wang

In order to improve the accuracy of semantic model intrinsic detection, a skeleton-based high-level semantic model intrinsic self-symmetry detection method is proposed. The semantic analysis of the model set is realized by the uniform segmentation of the model within the same style, the component correspondence of the model between different styles, and the shape content clustering. Based on the results of clustering analysis, for a given three-dimensional (3D) point cloud model, according to the curve skeleton, the skeleton point pairs reflecting the symmetry between the model surface points are obtained by the election method, and the symmetry is extended to the model surface vertices according to these skeleton point pairs. With the help of skeleton, the symmetry of the point cloud model is obtained, and then the symmetry region of point cloud model is obtained by the symmetric correspondence matrix and spectrum method, so as to realize the intrinsic symmetry detection of the model. The experimental results show that the proposed method has the advantages of less time, high accuracy, and high reliability.


Sign in / Sign up

Export Citation Format

Share Document