scholarly journals Point Cloud Instance Segmentation of Indoor Scenes Using Learned Pairwise Patch Relations

IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 15891-15901
Author(s):  
Lijie Yu ◽  
Yuliang Sun ◽  
Xudong Zhang ◽  
Yongwei Miao ◽  
Xiuli Zhang
Author(s):  
Zhiyong Gao ◽  
Jianhong Xiang

Background: While detecting the object directly from the 3D point cloud, the natural 3D patterns and invariance of 3D data are often obscure. Objective: In this work, we aimed at studying the 3D object detection from discrete, disordered and sparse 3D point clouds. Methods: The CNN is composed of the frustum sequence module, 3D instance segmentation module S-NET, 3D point cloud transformation module T-NET, and 3D boundary box estimation module E-NET. The search space of the object is determined by the frustum sequence module. The instance segmentation of the point cloud is performed by the 3D instance segmentation module. The 3D coordinates of the object are confirmed by the transformation module and the 3D bounding box estimation module. Results: Evaluated on KITTI benchmark dataset, our method outperforms the state of the art by remarkable margins while having real-time capability. Conclusion: We achieve real-time 3D object detection by proposing an improved convolutional neural network (CNN) based on image-driven point clouds.


Author(s):  
H. Huang ◽  
H. Jiang ◽  
C. Brenner ◽  
H. Mayer

We propose a novel method to segment Microsoft™Kinect data of indoor scenes with the emphasis on freeform objects. We use the full 3D information for the scene parsing and the segmentation of potential objects instead of treating the depth values as an additional channel of the 2D image. The raw RGBD image is first converted to a 3D point cloud with color. We then group the points into patches, which are derived from a 2D superpixel segmentation. With the assumption that every patch in the point cloud represents (a part of) the surface of an underlying solid body, a hypothetical quasi-3D model – the "synthetic volume primitive" (SVP) is constructed by extending the patch with a synthetic extrusion in 3D. The SVPs vote for a common object via intersection. By this means, a freeform object can be "assembled" from an unknown number of SVPs from arbitrary angles. Besides the intersection, two other criteria, i.e., coplanarity and color coherence, are integrated in the global optimization to improve the segmentation. Experiments demonstrate the potential of the proposed method.


2016 ◽  
Vol 174 ◽  
pp. 1101-1106 ◽  
Author(s):  
Wei Liu ◽  
Shaozi Li ◽  
Donglin Cao ◽  
Songzhi Su ◽  
Rongrong Ji

Author(s):  
Bernardo Lourenço ◽  
Tiago Madeira ◽  
Paulo Dias ◽  
Vitor M. Ferreira Santos ◽  
Miguel Oliveira

Purpose 2D laser rangefinders (LRFs) are commonly used sensors in the field of robotics, as they provide accurate range measurements with high angular resolution. These sensors can be coupled with mechanical units which, by granting an additional degree of freedom to the movement of the LRF, enable the 3D perception of a scene. To be successful, this reconstruction procedure requires to evaluate with high accuracy the extrinsic transformation between the LRF and the motorized system. Design/methodology/approach In this work, a calibration procedure is proposed to evaluate this transformation. The method does not require a predefined marker (commonly used despite its numerous disadvantages), as it uses planar features in the point acquired clouds. Findings Qualitative inspections show that the proposed method reduces artifacts significantly, which typically appear in point clouds because of inaccurate calibrations. Furthermore, quantitative results and comparisons with a high-resolution 3D scanner demonstrate that the calibrated point cloud represents the geometries present in the scene with much higher accuracy than with the un-calibrated point cloud. Practical implications The last key point of this work is the comparison of two laser scanners: the lemonbot (authors’) and a commercial FARO scanner. Despite being almost ten times cheaper, the laser scanner was able to achieve similar results in terms of geometric accuracy. Originality/value This work describes a novel calibration technique that is easy to implement and is able to achieve accurate results. One of its key features is the use of planes to calibrate the extrinsic transformation.


2021 ◽  
Vol 13 (16) ◽  
pp. 3140
Author(s):  
Liman Liu ◽  
Jinjin Yu ◽  
Longyu Tan ◽  
Wanjuan Su ◽  
Lin Zhao ◽  
...  

In order to deal with the problem that some existing semantic segmentation networks for 3D point clouds generally have poor performance on small objects, a Spatial Eight-Quadrant Kernel Convolution (SEQKC) algorithm is proposed to enhance the ability of the network for extracting fine-grained features from 3D point clouds. As a result, the semantic segmentation accuracy of small objects in indoor scenes can be improved. To be specific, in the spherical space of the point cloud neighborhoods, a kernel point with attached weights is constructed in each octant, the distances between the kernel point and the points in its neighborhood are calculated, and the distance and the kernel points’ weights are used together to weight the point cloud features in the neighborhood space. In this case, the relationship between points are modeled, so that the local fine-grained features of the point clouds can be extracted by the SEQKC. Based on the SEQKC, we design a downsampling module for point clouds, and embed it into classical semantic segmentation networks (PointNet++, PointSIFT and PointConv) for semantic segmentation. Experimental results on benchmark dataset ScanNet V2 show that SEQKC-based PointNet++, PointSIFT and PointConv outperform the original networks about 1.35–2.12% in terms of MIoU, and they effectively improve the semantic segmentation performance of the networks for small objects of indoor scenes, e.g., the segmentation accuracy of small object “picture” is improved from 0.70% of PointNet++ to 10.37% of SEQKC-PointNet++.


2018 ◽  
Vol 8 (1) ◽  
pp. 9 ◽  
Author(s):  
Wenzhong Shi ◽  
Wael Ahmed ◽  
Na Li ◽  
Wenzheng Fan ◽  
Haodong Xiang ◽  
...  

A method capable of automatically reconstructing 3D building models with semantic information from the unstructured 3D point cloud of indoor scenes is presented in this paper. This method has three main steps: 3D segmentation using a new hybrid algorithm, room layout reconstruction, and wall-surface object reconstruction by using an enriched approach. Unlike existing methods, this method aims to detect, cluster, and model complex structures without having prior scanner or trajectory information. In addition, this method enables the accurate detection of wall-surface “defacements”, such as windows, doors, and virtual openings. In addition to the detection of wall-surface apertures, the detection of closed objects, such as doors, is also possible. Hence, for the first time, the whole 3D modelling process of the indoor scene from a backpack laser scanner (BLS) dataset was achieved and is recorded for the first time. This novel method was validated using both synthetic data and real data acquired by a developed BLS system for indoor scenes. Evaluating our approach on synthetic datasets achieved a precision of around 94% and a recall of around 97%, while for BLS datasets our approach achieved a precision of around 95% and a recall of around 89%. The results reveal this novel method to be robust and accurate for 3D indoor modelling.


Author(s):  
Y. A. Lumban-Gaol ◽  
Z. Chen ◽  
M. Smit ◽  
X. Li ◽  
M. A. Erbaşu ◽  
...  

Abstract. Point cloud data have rich semantic representations and can benefit various applications towards a digital twin. However, they are unordered and anisotropically distributed, thus being unsuitable for a typical Convolutional Neural Networks (CNN) to handle. With the advance of deep learning, several neural networks claim to have solved the point cloud semantic segmentation problem. This paper evaluates three different neural networks for semantic segmentation of point clouds, namely PointNet++, PointCNN and DGCNN. A public indoor scene of the Amersfoort railway station is used as the study area. Unlike the typical indoor scenes and even more from the ubiquitous outdoor ones in currently available datasets, the station consists of objects such as the entrance gates, ticket machines, couches, and garbage cans. For the experiment, we use subsets from the data, remove the noise, evaluate the performance of the selected neural networks. The results indicate an overall accuracy of more than 90% for all the networks but vary in terms of mean class accuracy and mean Intersection over Union (IoU). The misclassification mainly occurs in the classes of couch and garbage can. Several factors that may contribute to the errors are analyzed, such as the quality of the data and the proportion of the number of points per class. The adaptability of the networks is also heavily dependent on the training location: the overall characteristics of the train station make a trained network for one location less suitable for another.


Sign in / Sign up

Export Citation Format

Share Document