Real-Time Height Measurement for Moving Pedestrians

Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions

ACM Transactions on Graphics ◽

10.1145/3451341 ◽

2021 ◽

Vol 40 (3) ◽

pp. 1-12

Author(s):

Hao Zhang ◽

Yuxiao Zhou ◽

Yifei Tian ◽

Jun-Hai Yong ◽

Feng Xu

Keyword(s):

Real Time ◽

Synthetic Data ◽

Real Data ◽

Depth Image ◽

Real Time System ◽

The Real ◽

Time Performance ◽

Contact Constraint ◽

Object Shapes ◽

Object Interactions

Reconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This article proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-rigid motions. To achieve this, we first train a joint learning network to segment the hand and object in a depth image, and to predict the 3D keypoints of the hand. With most layers shared by the two tasks, computation cost is saved for the real-time performance. A hybrid dataset is constructed here to train the network with real data (to learn real-world distributions) and synthetic data (to cover variations of objects, motions, and viewpoints). Next, the depth of the two targets and the keypoints are used in a uniform optimization to reconstruct the interacting motions. Benefitting from a novel tangential contact constraint, the system not only solves the remaining ambiguities but also keeps the real-time performance. Experiments show that our system handles different hand and object shapes, various interactive motions, and moving cameras.

Download Full-text

RobotP: A Benchmark Dataset for 6D Object Pose Estimation

Sensors ◽

10.3390/s21041299 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1299

Author(s):

Honglin Yuan ◽

Tim Hoogenkamp ◽

Remco C. Veltkamp

Keyword(s):

Pose Estimation ◽

Ground Truth ◽

3D Models ◽

Depth Image ◽

Great Success ◽

Estimation Algorithms ◽

Depth Images ◽

Object Pose Estimation ◽

Image Pairs ◽

Bounding Boxes

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.

Download Full-text

Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator

10.36227/techrxiv.12362249.v2 ◽

2020 ◽

Author(s):

Rui Fan ◽

Hengli Wang ◽

Bohuan Xue ◽

Huaiyang Huang ◽

Yuan Wang ◽

...

Keyword(s):

Large Scale ◽

Median Filter ◽

Ground Truth ◽

Depth Image ◽

Range Data ◽

3 Dimensional ◽

Depth Images ◽

Mesh Model ◽

Surface Normal ◽

Synthetic Datasets

Over the past decade, significant efforts have been made to improve the trade-off between speed and accuracy of surface normal estimators (SNEs). This paper introduces an accurate and ultrafast SNE for structured range data. The proposed approach computes surface normals by simply performing three filtering operations, namely, two image gradient filters (in horizontal and vertical directions, respectively) and a mean/median filter, on an inverse depth image or a disparity image. Despite the simplicity of the method, no similar method already exists in the literature. In our experiments, we created three large-scale synthetic datasets (easy, medium and hard) using 24 3-dimensional (3D) mesh models. Each mesh model is used to generate 1800--2500 pairs of 480x640 pixel depth images and the corresponding surface normal ground truth from different views. The average angular errors with respect to the easy, medium and hard datasets are 1.6 degrees, 5.6 degrees and 15.3 degrees, respectively. Our C++ and CUDA implementations achieve a processing speed of over 260 Hz and 21 kHz, respectively. Our proposed SNE achieves a better overall performance than all other existing computer vision-based SNEs. Our datasets and source code are publicly available at: sites.google.com/view/3f2n.

Download Full-text

Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator

10.36227/techrxiv.12362249.v1 ◽

2020 ◽

Author(s):

Rui Fan ◽

Hengli Wang ◽

Bohuan Xue ◽

Huaiyang Huang ◽

Yuan Wang, ◽

...

Keyword(s):

Large Scale ◽

Median Filter ◽

Ground Truth ◽

Depth Image ◽

Range Data ◽

3 Dimensional ◽

Depth Images ◽

Mesh Model ◽

Surface Normal ◽

Synthetic Datasets

Over the past decade, significant efforts have been made to improve the trade-off between speed and accuracy of surface normal estimators (SNEs). This paper introduces an accurate and ultrafast SNE for structured range data. The proposed approach computes surface normals by simply performing three filtering operations, namely, two image gradient filters (in horizontal and vertical directions, respectively) and a mean/median filter, on an inverse depth image or a disparity image. Despite the simplicity of the method, no similar method already exists in the literature. In our experiments, we created three large-scale synthetic datasets (easy, medium and hard) using 24 3-dimensional (3D) mesh models. Each mesh model is used to generate 1800--2500 pairs of 480x640 pixel depth images and the corresponding surface normal ground truth from different views. The average angular errors with respect to the easy, medium and hard datasets are 1.6 degrees, 5.6 degrees and 15.3 degrees, respectively. Our C++ and CUDA implementations achieve a processing speed of over 260 Hz and 21 kHz, respectively. Our proposed SNE achieves a better overall performance than all other existing computer vision-based SNEs. Our datasets and source code are publicly available at: sites.google.com/view/3f2n.

Download Full-text

Real-Time Assembly Recognition Based on the Similarity Between Spatial Increment and Part Model

Volume 2B: Advanced Manufacturing ◽

10.1115/imece2019-11160 ◽

2019 ◽

Author(s):

Jiazhen Pang ◽

Yuan Li ◽

Jie Zhang ◽

Jianfeng Yu

Keyword(s):

Real Time ◽

Weak Link ◽

Intelligent Manufacturing ◽

Digital Model ◽

Recognition Method ◽

Camera System ◽

The Real ◽

Depth Images ◽

Engineering Significance ◽

Part Model

Abstract Manual work is a weak link within the intelligent manufacturing, however, it plays an important role in the highly customized and multi-variety assembling. Assisted by intelligent assembling technology such as augmented reality, a manual worker can integrate into the cyber-physics system to improve efficiency and reduce errors, which is of great engineering significance in the assembling field of industry 4.0. Assembly recognition is the initial part of progress analysis and it has predictable changing progress stages which can be matched with the digital model for recognition constraints. Therefore, based on the similarity between spatial increment information and part model, a real-time assembly recognition method is proposed in this paper. Firstly, the depth images from the multi-camera system were used to capture the assembling scene. Then, compared with the previous assembling scene, the spatial incremental information was used to quantitatively represent the assembled part. The spatial increment information and digital model are described with distance distribution. Finally, based on Earth mover’s distance algorithm, the matching between the spatial increment information and the part model indicates the part which had been assembled to realize the real-time assembly recognition. In the case study, an assembling process for 3D printing assembly which corresponded with the digital model was used to approve the feasibility of the real-time assembly recognition method.

Download Full-text

Image Noise Preprocessing of Interactive Projection System Based on Switching Filtering Scheme

Complexity ◽

10.1155/2018/1258306 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 4

Author(s):

Lei Yu

Keyword(s):

Real Time ◽

Image Filtering ◽

Depth Image ◽

Projection System ◽

The Real ◽

Time Performance ◽

Arbitrary Switching ◽

System Method ◽

Good Target ◽

Switching Filter

The interactive projection systems based on deep images are usually disturbed by the mixed noise. Generally, several filtering methods are used in combination to resolve this problem. Although the hybrid filter can guarantee the accuracy of the image, but the algorithm is complex and time-consuming, which affects the real-time performance of the interactive projection system. In this paper, the switching system method is introduced into the filter for the first time, and an arbitrary switching filter algorithm is proposed and applied to the depth image filtering system based on Kinect sensor. The experimental results demonstrate and validate that the proposed switching filter algorithm not only effectively removes the noise but also ensures the real-time performance of tracking and achieves good target tracking performance, which makes it applicable in various image filtering processing systems.

Download Full-text

A Mobile Augmented Reality System for the Real-Time Visualization of Pipes in Point Cloud Data with a Depth Sensor

Electronics ◽

10.3390/electronics9050836 ◽

2020 ◽

Vol 9 (5) ◽

pp. 836 ◽

Cited By ~ 1

Author(s):

Young-Hoon Jin ◽

In-Tae Hwang ◽

Won-Hyung Lee

Keyword(s):

Augmented Reality ◽

Real Time ◽

Point Cloud ◽

Depth Image ◽

Visualization Method ◽

Point Cloud Data ◽

Matching Method ◽

The Real ◽

Cloud Data ◽

Real Time Visualization

Augmented reality (AR) is a useful visualization technology that displays information by adding virtual images to the real world. In AR systems that require three-dimensional information, point cloud data is easy to use after real-time acquisition, however, it is difficult to measure and visualize real-time objects due to the large amount of data and a matching process. In this paper we explored a method of estimating pipes from point cloud data and visualizing them in real-time through augmented reality devices. In general, pipe estimation in a point cloud uses a Hough transform and is performed through a preprocessing process, such as noise filtering, normal estimation, or segmentation. However, there is a disadvantage in that the execution time is slow due to a large amount of computation. Therefore, for the real-time visualization in augmented reality devices, the fast cylinder matching method using random sample consensus (RANSAC) is required. In this paper, we proposed parallel processing, multiple frames, adjustable scale, and error correction for real-time visualization. The real-time visualization method through the augmented reality device obtained a depth image from the sensor and configured a uniform point cloud using a voxel grid algorithm. The constructed data was analyzed according to the fast cylinder matching method using RANSAC. The real-time visualization method through augmented reality devices is expected to be used to identify problems, such as the sagging of pipes, through real-time measurements at plant sites due to the spread of various AR devices.

Download Full-text

Real-Time Multiobject Tracking Based on Multiway Concurrency

Sensors ◽

10.3390/s21030685 ◽

2021 ◽

Vol 21 (3) ◽

pp. 685

Author(s):

Xuan Gong ◽

Zichun Le ◽

Yukun Wu ◽

Hui Wang

Keyword(s):

Real Time ◽

System Level ◽

Depth Image ◽

Tracking Accuracy ◽

Video Streams ◽

The Real ◽

Practical Applications ◽

Time Performance ◽

Tracking By Detection ◽

Multiobject Tracking

This paper explored a pragmatic approach to research the real-time performance of a multiway concurrent multiobject tracking (MOT) system. At present, most research has focused on the tracking of single-image sequences, but in practical applications, multiway video streams need to be processed in parallel by MOT systems. There have been few studies on the real-time performance of multiway concurrent MOT systems. In this paper, we proposed a new MOT framework to solve multiway concurrency scenario based on a tracking-by-detection (TBD) model. The new framework mainly focuses on concurrency and real-time based on limited computing and storage resources, while considering the algorithm performance. For the former, three aspects were studied: (1) Expanded width and depth of tracking-by-detection model. In terms of width, the MOT system can support the process of multiway video sequence at the same time; in terms of depth, image collectors and bounding box collectors were introduced to support batch processing. (2) Considering the real-time performance and multiway concurrency ability, we proposed one kind of real-time MOT algorithm based on directly driven detection. (3) Optimization of system level—we also utilized the inference optimization features of NVIDIA TensorRT to accelerate the deep neural network (DNN) in the tracking algorithm. To trade off the performance of the algorithm, a negative sample (false detection sample) filter was designed to ensure tracking accuracy. Meanwhile, the factors that affect the system real-time performance and concurrency were studied. The experiment results showed that our method has a good performance in processing multiple concurrent real-time video streams.

Download Full-text

Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator

10.36227/techrxiv.12362249.v3 ◽

2021 ◽

Author(s):

Rui Fan ◽

Hengli Wang ◽

Bohuan Xue ◽

Huaiyang Huang ◽

Yuan Wang ◽

...

Keyword(s):

Large Scale ◽

Median Filter ◽

Ground Truth ◽

Sensor Data ◽

Depth Image ◽

Depth Images ◽

Surface Normal ◽

Synthetic Datasets ◽

Normal Maps ◽

Mesh Models

This paper proposes three-filters-to-normal (3F2N), an accurate and ultrafast surface normal estimator (SNE), which is designed for structured range sensor data, e.g., depth/disparity images. 3F2N SNE computes surface normals by simply performing three filtering operations (two image gradient filters in horizontal and vertical directions, respectively, and a mean/median filter) on an inverse depth image or a disparity image. Despite the simplicity of 3F2N SNE, no similar method already exists in the literature. To evaluate the performance of our proposed SNE, we created three large-scale synthetic datasets (easy, medium and hard) using 24 3D mesh models, each of which is used to generate 1800--2500 pairs of depth images (resolution: 480X640 pixels) and the corresponding ground-truth surface normal maps from different views. 3F2N SNE demonstrates the state-of-the-art performance, outperforming all other existing geometry-based SNEs, where the average angular errors with respect to the easy, medium and hard datasets are 1.66 degrees, 5.69 degrees and 15.31 degrees, respectively. Furthermore, our C++ and CUDA implementations achieve a processing speed of over 260 Hz and 21 kHz, respectively. Our datasets and source code are publicly available at sites.google.com/view/3f2n.

Download Full-text

Texture Synthesis Repair of RealSense D435i Depth Images with Object-Oriented RGB Image Segmentation

Sensors ◽

10.3390/s20236725 ◽

2020 ◽

Vol 20 (23) ◽

pp. 6725

Author(s):

Longyu Zhang ◽

Hao Xia ◽

Yanyou Qiao

Keyword(s):

Texture Synthesis ◽

Low Cost ◽

Similarity Index ◽

Object Oriented ◽

Ground Truth ◽

Depth Image ◽

Depth Camera ◽

Distance Information ◽

Depth Images ◽

Rgb Image

A depth camera is a kind of sensor that can directly collect distance information between an object and the camera. The RealSense D435i is a low-cost depth camera that is currently in widespread use. When collecting data, an RGB image and a depth image are acquired simultaneously. The quality of the RGB image is good, whereas the depth image typically has many holes. In a lot of applications using depth images, these holes can lead to serious problems. In this study, a repair method of depth images was proposed. The depth image is repaired using the texture synthesis algorithm with the RGB image, which is segmented through a multi-scale object-oriented method. The object difference parameter is added to the process of selecting the best sample block. In contrast with previous methods, the experimental results show that the proposed method avoids the error filling of holes, the edge of the filled holes is consistent with the edge of RGB images, and the repair accuracy is better. The root mean square error, peak signal-to-noise ratio, and structural similarity index measure from the repaired depth images and ground-truth image were better than those obtained by two other methods. We believe that the repair of the depth image can improve the effects of depth image applications.

Download Full-text