scholarly journals GesID: 3D Gesture Authentication Based on Depth Camera and One-Class Classification

Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3265 ◽  
Author(s):  
Xuan Wang ◽  
Jiro Tanaka

Biometric authentication is popular in authentication systems, and gesture as a carrier of behavior characteristics has the advantages of being difficult to imitate and containing abundant information. This research aims to use three-dimensional (3D) depth information of gesture movement to perform authentication with less user effort. We propose an approach based on depth cameras, which satisfies three requirements: Can authenticate from a single, customized gesture; achieves high accuracy without an excessive number of gestures for training; and continues learning the gesture during use of the system. To satisfy these requirements respectively: We use a sparse autoencoder to memorize the single gesture; we employ data augmentation technology to solve the problem of insufficient data; and we use incremental learning technology for allowing the system to memorize the gesture incrementally over time. An experiment has been performed on different gestures in different user situations that demonstrates the accuracy of one-class classification (OCC), and proves the effectiveness and reliability of the approach. Gesture authentication based on 3D depth cameras could be achieved with reduced user effort.

Sensors ◽  
2019 ◽  
Vol 19 (13) ◽  
pp. 3008 ◽  
Author(s):  
Zhe Liu ◽  
Zhaozong Meng ◽  
Nan Gao ◽  
Zonghua Zhang

Depth cameras play a vital role in three-dimensional (3D) shape reconstruction, machine vision, augmented/virtual reality and other visual information-related fields. However, a single depth camera cannot obtain complete information about an object by itself due to the limitation of the camera’s field of view. Multiple depth cameras can solve this problem by acquiring depth information from different viewpoints. In order to do so, they need to be calibrated to be able to accurately obtain the complete 3D information. However, traditional chessboard-based planar targets are not well suited for calibrating the relative orientations between multiple depth cameras, because the coordinates of different depth cameras need to be unified into a single coordinate system, and the multiple camera systems with a specific angle have a very small overlapping field of view. In this paper, we propose a 3D target-based multiple depth camera calibration method. Each plane of the 3D target is used to calibrate an independent depth camera. All planes of the 3D target are unified into a single coordinate system, which means the feature points on the calibration plane are also in one unified coordinate system. Using this 3D target, multiple depth cameras can be calibrated simultaneously. In this paper, a method of precise calibration using lidar is proposed. This method is not only applicable to the 3D target designed for the purposes of this paper, but it can also be applied to all 3D calibration objects consisting of planar chessboards. This method can significantly reduce the calibration error compared with traditional camera calibration methods. In addition, in order to reduce the influence of the infrared transmitter of the depth camera and improve its calibration accuracy, the calibration process of the depth camera is optimized. A series of calibration experiments were carried out, and the experimental results demonstrated the reliability and effectiveness of the proposed method.


Author(s):  
J. Kim ◽  
J. Y. Jun ◽  
M. Hong ◽  
H. Shim ◽  
J. Ahn

<p><strong>Abstract.</strong> In the past few decades, a number of scholars studied painting classification based on image processing or computer vision technologies. Further, as the machine learning technology rapidly developed, painting classification using machine learning has been carried out. However, due to the lack of information about brushstrokes in the photograph, typical models cannot use more precise information of the painters painting style. We hypothesized that the visualized depth information of brushstroke is effective to improve the accuracy of the machine learning model for painting classification. This study proposes a new data utilization approach in machine learning with Reflectance Transformation Imaging (RTI) images, which maximizes the visualization of a three-dimensional shape of brushstrokes. Certain artist’s unique brushstrokes can be revealed in RTI images, which are difficult to obtain with regular photographs. If these new types of images are applied as data to train in with the machine learning model, classification would be conducted including not only the shape of the color but also the depth information. We used the Convolution Neural Network (CNN), a model optimized for image classification, using the VGG-16, ResNet-50, and DenseNet-121 architectures. We conducted a two-stage experiment using the works of two Korean artists. In the first experiment, we obtained a key part of the painting from RTI data and photographic data. In the second experiment on the second artists work, a larger quantity of data are acquired, and the whole part of the artwork was captured. The result showed that RTI-trained model brought higher accuracy than Non-RTI trained model. In this paper, we propose a method which uses machine learning and RTI technology to analyze and classify paintings more precisely to verify our hypothesis.</p>


Information ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 18
Author(s):  
Yu-Cheng Fan ◽  
Sheng-Bi Wang

With the advancement of artificial intelligence, deep learning technology is applied in many fields. The autonomous car system is one of the most important application areas of artificial intelligence. LiDAR (Light Detection and Ranging) is one of the most critical components of self-driving cars. LiDAR can quickly scan the environment to obtain a large amount of high-precision three-dimensional depth information. Self-driving cars use LiDAR to reconstruct the three-dimensional environment. The autonomous car system can identify various situations in the vicinity through the information provided by LiDAR and choose a safer route. This paper is based on Velodyne HDL-64 LiDAR to decode data packets of LiDAR. The decoder we designed converts the information of the original data packet into X, Y, and Z point cloud data so that the autonomous vehicle can use the decoded information to reconstruct the three-dimensional environment and perform object detection and object classification. In order to prove the performance of the proposed LiDAR decoder, we use the standard original packets used for the comparison of experimental data, which are all taken from the Map GMU (George Mason University). The average decoding time of a frame is 7.678 milliseconds. Compared to other methods, the proposed LiDAR decoder has higher decoding speed and efficiency.


2021 ◽  
Vol 11 (12) ◽  
pp. 5383
Author(s):  
Huachen Gao ◽  
Xiaoyu Liu ◽  
Meixia Qu ◽  
Shijie Huang

In recent studies, self-supervised learning methods have been explored for monocular depth estimation. They minimize the reconstruction loss of images instead of depth information as a supervised signal. However, existing methods usually assume that the corresponding points in different views should have the same color, which leads to unreliable unsupervised signals and ultimately damages the reconstruction loss during the training. Meanwhile, in the low texture region, it is unable to predict the disparity value of pixels correctly because of the small number of extracted features. To solve the above issues, we propose a network—PDANet—that integrates perceptual consistency and data augmentation consistency, which are more reliable unsupervised signals, into a regular unsupervised depth estimation model. Specifically, we apply a reliable data augmentation mechanism to minimize the loss of the disparity map generated by the original image and the augmented image, respectively, which will enhance the robustness of the image in the prediction of color fluctuation. At the same time, we aggregate the features of different layers extracted by a pre-trained VGG16 network to explore the higher-level perceptual differences between the input image and the generated one. Ablation studies demonstrate the effectiveness of each components, and PDANet shows high-quality depth estimation results on the KITTI benchmark, which optimizes the state-of-the-art method from 0.114 to 0.084, measured by absolute relative error for depth estimation.


Author(s):  
HyeonJung Park ◽  
Youngki Lee ◽  
JeongGil Ko

In this work we present SUGO, a depth video-based system for translating sign language to text using a smartphone's front camera. While exploiting depth-only videos offer benefits such as being less privacy-invasive compared to using RGB videos, it introduces new challenges which include dealing with low video resolutions and the sensors' sensitiveness towards user motion. We overcome these challenges by diversifying our sign language video dataset to be robust to various usage scenarios via data augmentation and design a set of schemes to emphasize human gestures from the input images for effective sign detection. The inference engine of SUGO is based on a 3-dimensional convolutional neural network (3DCNN) to classify a sequence of video frames as a pre-trained word. Furthermore, the overall operations are designed to be light-weight so that sign language translation takes place in real-time using only the resources available on a smartphone, with no help from cloud servers nor external sensing components. Specifically, to train and test SUGO, we collect sign language data from 20 individuals for 50 Korean Sign Language words, summing up to a dataset of ~5,000 sign gestures and collect additional in-the-wild data to evaluate the performance of SUGO in real-world usage scenarios with different lighting conditions and daily activities. Comprehensively, our extensive evaluations show that SUGO can properly classify sign words with an accuracy of up to 91% and also suggest that the system is suitable (in terms of resource usage, latency, and environmental robustness) to enable a fully mobile solution for sign language translation.


2020 ◽  
Vol 6 (2) ◽  
pp. eaay6036 ◽  
Author(s):  
R. C. Feord ◽  
M. E. Sumner ◽  
S. Pusdekar ◽  
L. Kalra ◽  
P. T. Gonzalez-Bellido ◽  
...  

The camera-type eyes of vertebrates and cephalopods exhibit remarkable convergence, but it is currently unknown whether the mechanisms for visual information processing in these brains, which exhibit wildly disparate architecture, are also shared. To investigate stereopsis in a cephalopod species, we affixed “anaglyph” glasses to cuttlefish and used a three-dimensional perception paradigm. We show that (i) cuttlefish have also evolved stereopsis (i.e., the ability to extract depth information from the disparity between left and right visual fields); (ii) when stereopsis information is intact, the time and distance covered before striking at a target are shorter; (iii) stereopsis in cuttlefish works differently to vertebrates, as cuttlefish can extract stereopsis cues from anticorrelated stimuli. These findings demonstrate that although there is convergent evolution in depth computation, cuttlefish stereopsis is likely afforded by a different algorithm than in humans, and not just a different implementation.


2013 ◽  
Vol 319 ◽  
pp. 343-347
Author(s):  
Ru Ting Xia ◽  
Xiao Yan Zhou

This research aimed to reveal characteristics of visual attention of low-vision drivers. Near and far stimuli were used by means of a three-dimensional (3D) attention measurement system that simulated traffic environment. We measured the reaction time of subjects while attention shifted in three kinds of imitational peripheral environment illuminance (daylight, twilight and dawn conditions). Subjects were required to judge whether the target presented nearer than fixation point or further than it. The results showed that the peripheral environment illuminance had evident influence on the reaction time of drivers, the reaction time was slow in dawn and twilight conditions than in daylight condition, distribution of attention had the advantage in nearer space than farther space, that is, and the shifts of attention in 3D space had an anisotropy characteristic in depth. The results suggested that (1) visual attention might be operated with both precueing paradigm and stimulus controls included the depth information, (2) an anisotropy characteristic of attention shifting depend on the attention moved distance, and it showed remarkably in dawn condition than in daylight and twilight conditions.


10.29007/72d4 ◽  
2018 ◽  
Author(s):  
He Liu ◽  
Edouard Auvinet ◽  
Joshua Giles ◽  
Ferdinando Rodriguez Y Baena

Computer Aided Surgery (CAS) is helpful, but it clutters an already overcrowded operating theatre, and tends to disrupt the workflow of conventional surgery. In order to provide seamless computer assistance with improved immersion and a more natural surgical workflow, we propose an augmented-reality based navigation system for CAS. Here, we choose to focus on the proximal femoral anatomy, which we register to a plan by processing depth information of the surgical site captured by a commercial depth camera. Intra-operative three-dimensional surgical guidance is then provided to the surgeon through a commercial augmented reality headset, to drill a pilot hole in the femoral head, so that the user can perform the operation without additional physical guides. The user can interact intuitively with the system by simple gestures and voice commands, resulting in a more natural workflow. To assess the surgical accuracy of the proposed setup, 30 experiments of pilot hole drilling were performed on femur phantoms. The position and the orientation of the drilled guide holes were measured and compared with the preoperative plan, and the mean errors were within 2mm and 2°, results which are in line with commercial computer assisted orthopedic systems today.


2021 ◽  
Author(s):  
Wei Li ◽  
Yangyong Cao ◽  
Kun Yu ◽  
Yibo Cai ◽  
Feng Huang ◽  
...  

Abstract Background: The COVID-19 disease is putting unprecedented pressure on the global healthcare system. The CT examination as a auxiliary confirmed diagnostic method can help clinicians quickly detect lesions locations of COVID-19 once screening by PCR test. Furthermore, the lesion subtypes classification plays a critical role in the consequent treatment decision. Identifying the subtypes of lesions accurately can help doctors discover changes in lesions in time and better assess the severity of COVID-19. Method: The most four typical lesion subtypes of COVID-19 are discussed in this paper, which are ground-glass opacity (GGO), cord, solid and subsolid. A computer aided diagnosis approach of lesion subtype is proposed in this paper. The radiomics data of lesions are segmented from COVID-19 patients CT images with diagnosis and lesions annotations by radiologists. Then the three dimensional texture descriptors are applied on the volume data of lesions as well as shape and First order features. The massive feature data are selected by hybrid adaptive selection algorithm and a classification model is trained at the same time. The classifier is used to predict lesion subtypes as side decision information for radiologists. Results: There are 3734 lesions extracted from the dataset with 319 patients collection and then 189 radiomics features are obtained finally. The random forest classifier is trained with data augmentation that the number of different subtypes of lesions is imbalanced in initial dataset. The experimental results show that the accuracy of the four subtypes of lesions is (0.9306, 0.9684, 0.9958, and 0.9430), the recall is (0.9552, 0.9158, 0.9580 and 0.8075) and the f-score is (0.93.84, 0.92.37, 0.95.47, and 84.42). Conclusion: The method is evaluated in multiple sufficient experiments. The results show that the 3D radiomics features chosen by hybrid adaptive selection algorithm can better express the advanced information of the lesion data. The classification model obtains a good performance and is compared the models of COVID-19 in the stat of art, which can help clinicians more accurately identify the subtypes of COVID-19 lesions and provide help for further research.


2022 ◽  
Vol 13 (1) ◽  
pp. 1-20
Author(s):  
Shui-Hua Wang ◽  
Xin Zhang ◽  
Yu-Dong Zhang

( Aim ) COVID-19 has caused more than 2.28 million deaths till 4/Feb/2021 while it is still spreading across the world. This study proposed a novel artificial intelligence model to diagnose COVID-19 based on chest CT images. ( Methods ) First, the two-dimensional fractional Fourier entropy was used to extract features. Second, a custom deep stacked sparse autoencoder (DSSAE) model was created to serve as the classifier. Third, an improved multiple-way data augmentation was proposed to resist overfitting. ( Results ) Our DSSAE model obtains a micro-averaged F1 score of 92.32% in handling a four-class problem (COVID-19, community-acquired pneumonia, secondary pulmonary tuberculosis, and healthy control). ( Conclusion ) Our method outperforms 10 state-of-the-art approaches.


Sign in / Sign up

Export Citation Format

Share Document