scholarly journals Indoor Scene Change Captioning Based on Multimodality Data

Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4761 ◽  
Author(s):  
Yue Qiu ◽  
Yutaka Satoh ◽  
Ryota Suzuki ◽  
Kenji Iwata ◽  
Hirokatsu Kataoka

This study proposes a framework for describing a scene change using natural language text based on indoor scene observations conducted before and after a scene change. The recognition of scene changes plays an essential role in a variety of real-world applications, such as scene anomaly detection. Most scene understanding research has focused on static scenes. Most existing scene change captioning methods detect scene changes from single-view RGB images, neglecting the underlying three-dimensional structures. Previous three-dimensional scene change captioning methods use simulated scenes consisting of geometry primitives, making it unsuitable for real-world applications. To solve these problems, we automatically generated large-scale indoor scene change caption datasets. We propose an end-to-end framework for describing scene changes from various input modalities, namely, RGB images, depth images, and point cloud data, which are available in most robot applications. We conducted experiments with various input modalities and models and evaluated model performance using datasets with various levels of complexity. Experimental results show that the models that combine RGB images and point cloud data as input achieve high performance in sentence generation and caption correctness and are robust for change type understanding for datasets with high complexity. The developed datasets and models contribute to the study of indoor scene change understanding.

2021 ◽  
Vol 87 (7) ◽  
pp. 479-484
Author(s):  
Yu Hou ◽  
Ruifeng Zhai ◽  
Xueyan Li ◽  
Junfeng Song ◽  
Xuehan Ma ◽  
...  

Three-dimensional reconstruction from a single image has excellent future prospects. The use of neural networks for three-dimensional reconstruction has achieved remarkable results. Most of the current point-cloud-based three-dimensional reconstruction networks are trained using nonreal data sets and do not have good generalizability. Based on the Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago ()data set of large-scale scenes, this article proposes a method for processing real data sets. The data set produced in this work can better train our network model and realize point cloud reconstruction based on a single picture of the real world. Finally, the constructed point cloud data correspond well to the corresponding three-dimensional shapes, and to a certain extent, the disadvantage of the uneven distribution of the point cloud data obtained by light detection and ranging scanning is overcome using the proposed method.


2019 ◽  
Vol 8 (8) ◽  
pp. 343 ◽  
Author(s):  
Li ◽  
Hasegawa ◽  
Nii ◽  
Tanaka

Digital archiving of three-dimensional cultural heritage assets has increased the demand for visualization of large-scale point clouds of cultural heritage assets acquired by laser scanning. We proposed a fused transparent visualization method that visualizes a point cloud of a cultural heritage asset in an environment using a photographic image as the background. We also proposed lightness adjustment and color enhancement methods to deal with the reduced visibility caused by the fused visualization. We applied the proposed method to a laser-scanned point cloud of a high-valued cultural festival float with complex inner and outer structures. Experimental results demonstrate that the proposed method enables high-quality transparent visualization of the cultural asset in its surrounding environment.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 884
Author(s):  
Chia-Ming Tsai ◽  
Yi-Horng Lai ◽  
Yung-Da Sun ◽  
Yu-Jen Chung ◽  
Jau-Woei Perng

Numerous sensors can obtain images or point cloud data on land, however, the rapid attenuation of electromagnetic signals and the lack of light in water have been observed to restrict sensing functions. This study expands the utilization of two- and three-dimensional detection technologies in underwater applications to detect abandoned tires. A three-dimensional acoustic sensor, the BV5000, is used in this study to collect underwater point cloud data. Some pre-processing steps are proposed to remove noise and the seabed from raw data. Point clouds are then processed to obtain two data types: a 2D image and a 3D point cloud. Deep learning methods with different dimensions are used to train the models. In the two-dimensional method, the point cloud is transferred into a bird’s eye view image. The Faster R-CNN and YOLOv3 network architectures are used to detect tires. Meanwhile, in the three-dimensional method, the point cloud associated with a tire is cut out from the raw data and is used as training data. The PointNet and PointConv network architectures are then used for tire classification. The results show that both approaches provide good accuracy.


2013 ◽  
Vol 796 ◽  
pp. 513-518
Author(s):  
Rong Jin ◽  
Bing Fei Gu ◽  
Guo Lian Liu

In this paper 110 female undergraduates in Soochow University are measured by using 3D non-contact measurement system and manual measurement. 3D point cloud data of human body is taken as research objects by using anti-engineering software, and secondary development of point cloud data is done on the basis of optimizing point cloud data. In accordance with the definition of the human chest width points and other feature points, and in the operability of the three-dimensional point cloud data, the width, thickness, and length dimensions of the curve through the chest width point are measured. Classification of body type is done by choosing the ratio values as classification index which is the ratio between thickness and width of the curve. The generation rules of the chest curve are determined for each type by using linear regression method. Human arm model could be established by the computer automatically. Thereby the individual model of the female upper body mannequin modeling can be improved effectively.


Author(s):  
Romina Dastoorian ◽  
Ahmad E. Elhabashy ◽  
Wenmeng Tian ◽  
Lee J. Wells ◽  
Jaime A. Camelio

With the latest advancements in three-dimensional (3D) measurement technologies, obtaining 3D point cloud data for inspection purposes in manufacturing is becoming more common. While 3D point cloud data allows for better inspection capabilities, their analysis is typically challenging. Especially with unstructured 3D point cloud data, containing coordinates at random locations, the challenges increase with higher levels of noise and larger volumes of data. Hence, the objective of this paper is to extend the previously developed Adaptive Generalized Likelihood Ratio (AGLR) approach to handle unstructured 3D point cloud data used for automated surface defect inspection in manufacturing. More specifically, the AGLR approach was implemented in a practical case study to inspect twenty-seven samples, each with a unique fault. These faults were designed to cover an array of possible faults having three different sizes, three different magnitudes, and located in three different locations. The results show that the AGLR approach can indeed differentiate between non-faulty and a varying range of faulty surfaces while being able to pinpoint the fault location. This work also serves as a validation for the previously developed AGLR approach in a practical scenario.


Author(s):  
Y. Hori ◽  
T. Ogawa

The implementation of laser scanning in the field of archaeology provides us with an entirely new dimension in research and surveying. It allows us to digitally recreate individual objects, or entire cities, using millions of three-dimensional points grouped together in what is referred to as "point clouds". In addition, the visualization of the point cloud data, which can be used in the final report by archaeologists and architects, should usually be produced as a JPG or TIFF file. Not only the visualization of point cloud data, but also re-examination of older data and new survey of the construction of Roman building applying remote-sensing technology for precise and detailed measurements afford new information that may lead to revising drawings of ancient buildings which had been adduced as evidence without any consideration of a degree of accuracy, and finally can provide new research of ancient buildings. We used laser scanners at fields because of its speed, comprehensive coverage, accuracy and flexibility of data manipulation. Therefore, we “skipped” many of post-processing and focused on the images created from the meta-data simply aligned using a tool which extended automatic feature-matching algorithm and a popular renderer that can provide graphic results.


2016 ◽  
Vol 31 (9) ◽  
pp. 889-896
Author(s):  
马鑫 MA Xin ◽  
魏仲慧 WEI Zhong-hui ◽  
何昕 HE Xin ◽  
于国栋 YU Guo-dong

Sign in / Sign up

Export Citation Format

Share Document