scholarly journals Recognition of Symbolic Gestures Using Depth Information

2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Hasan Mahmud ◽  
Md. Kamrul Hasan ◽  
Abdullah-Al-Tariq ◽  
Md. Hasanul Kabir ◽  
M. A. Mottalib

Symbolic gestures are the hand postures with some conventionalized meanings. They are static gestures that one can perform in a very complex environment containing variations in rotation and scale without using voice. The gestures may be produced in different illumination conditions or occluding background scenarios. Any hand gesture recognition system should find enough discriminative features, such as hand-finger contextual information. However, in existing approaches, depth information of hand fingers that represents finger shapes is utilized in limited capacity to extract discriminative features of fingers. Nevertheless, if we consider finger bending information (i.e., a finger that overlaps palm), extracted from depth map, and use them as local features, static gestures varying ever so slightly can become distinguishable. Our work here corroborated this idea and we have generated depth silhouettes with variation in contrast to achieve more discriminative keypoints. This approach, in turn, improved the recognition accuracy up to 96.84%. We have applied Scale-Invariant Feature Transform (SIFT) algorithm which takes the generated depth silhouettes as input and produces robust feature descriptors as output. These features (after converting into unified dimensional feature vectors) are fed into a multiclass Support Vector Machine (SVM) classifier to measure the accuracy. We have tested our results with a standard dataset containing 10 symbolic gesture representing 10 numeric symbols (0-9). After that we have verified and compared our results among depth images, binary images, and images consisting of the hand-finger edge information generated from the same dataset. Our results show higher accuracy while applying SIFT features on depth images. Recognizing numeric symbols accurately performed through hand gestures has a huge impact on different Human-Computer Interaction (HCI) applications including augmented reality, virtual reality, and other fields.

2017 ◽  
Vol 131 (13) ◽  
pp. 1465-1481 ◽  
Author(s):  
Víctor González-Castro ◽  
María del C. Valdés Hernández ◽  
Francesca M. Chappell ◽  
Paul A. Armitage ◽  
Stephen Makin ◽  
...  

In the brain, enlarged perivascular spaces (PVS) relate to cerebral small vessel disease (SVD), poor cognition, inflammation and hypertension. We propose a fully automatic scheme that uses a support vector machine (SVM) to classify the burden of PVS in the basal ganglia (BG) region as low or high. We assess the performance of three different types of descriptors extracted from the BG region in T2-weighted MRI images: (i) statistics obtained from Wavelet transform’s coefficients, (ii) local binary patterns and (iii) bag of visual words (BoW) based descriptors characterizing local keypoints obtained from a dense grid with the scale-invariant feature transform (SIFT) characteristics. When the latter were used, the SVM classifier achieved the best accuracy (81.16%). The output from the classifier using the BoW descriptors was compared with visual ratings done by an experienced neuroradiologist (Observer 1) and by a trained image analyst (Observer 2). The agreement and cross-correlation between the classifier and Observer 2 (κ = 0.67 (0.58–0.76)) were slightly higher than between the classifier and Observer 1 (κ = 0.62 (0.53–0.72)) and comparable between both the observers (κ = 0.68 (0.61–0.75)). Finally, three logistic regression models using clinical variables as independent variable and each of the PVS ratings as dependent variable were built to assess how clinically meaningful were the predictions of the classifier. The goodness-of-fit of the model for the classifier was good (area under the curve (AUC) values: 0.93 (model 1), 0.90 (model 2) and 0.92 (model 3)) and slightly better (i.e. AUC values: 0.02 units higher) than that of the model for Observer 2. These results suggest that, although it can be improved, an automatic classifier to assess PVS burden from brain MRI can provide clinically meaningful results close to those from a trained observer.


2020 ◽  
Author(s):  
Thamba Meshach W ◽  
Hemajothi S ◽  
Mary Anita E A

Abstract Human affect recognition (HAR) using images of facial expression and electrocardiogram (ECG) signal plays an important role in predicting human intention. This system improves the performance of the system in applications like the security system, learning technologies and health care systems. The primary goal of our work is to recognize individual affect states automatically using the multilayered binary structured support vector machine (MBSVM), which efficiently classify the input into one of the four affect classes, relax, happy, sad and angry. The classification is performed efficiently by designing an efficient support vector machine (SVM) classifier in multilayer mode operation. The classifier is trained using the 8-fold cross-validation method, which improves the learning of the classifier, thus increasing its efficiency. The classification and recognition accuracy is enhanced and also overcomes the drawback of ‘facial mimicry’ by using hybrid features that are extracted from both facial images (visual elements) and physiological signal ECG (signal features). The reliability of the input database is improved by acquiring the face images and ECG signals experimentally and by inducing emotions through image stimuli. The performance of the affect recognition system is evaluated using the confusion matrix, obtaining the classification accuracy of 96.88%.


2019 ◽  
Vol 16 (04) ◽  
pp. 1941002 ◽  
Author(s):  
Jing Li ◽  
Yang Mi ◽  
Gongfa Li ◽  
Zhaojie Ju

Facial expression recognition has been widely used in human computer interaction (HCI) systems. Over the years, researchers have proposed different feature descriptors, implemented different classification methods, and carried out a number of experiments on various datasets for automatic facial expression recognition. However, most of them used 2D static images or 2D video sequences for the recognition task. The main limitations of 2D-based analysis are problems associated with variations in pose and illumination, which reduce the recognition accuracy. Therefore, an alternative way is to incorporate depth information acquired by 3D sensor, because it is invariant in both pose and illumination. In this paper, we present a two-stream convolutional neural network (CNN)-based facial expression recognition system and test it on our own RGB-D facial expression dataset collected by Microsoft Kinect for XBOX in unspontaneous scenarios since Kinect is an inexpensive and portable device to capture both RGB and depth information. Our fully annotated dataset includes seven expressions (i.e., neutral, sadness, disgust, fear, happiness, anger, and surprise) for 15 subjects (9 males and 6 females) aged from 20 to 25. The two individual CNNs are identical in architecture but do not share parameters. To combine the detection results produced by these two CNNs, we propose the late fusion approach. The experimental results demonstrate that the proposed two-stream network using RGB-D images is superior to that of using only RGB images or depth images.


2018 ◽  
Vol 28 (08) ◽  
pp. 1850010 ◽  
Author(s):  
Qi Yuan ◽  
Weidong Zhou ◽  
Fangzhou Xu ◽  
Yan Leng ◽  
Dongmei Wei

The automatic identification of epileptic electroencephalogram (EEG) signals can give assistance to doctors in diagnosis of epilepsy, and provide the higher security and quality of life for people with epilepsy. Feature extraction of EEG signals determines the performance of the whole recognition system. In this paper, a novel method using the local binary pattern (LBP) based on the wavelet transform (WT) is proposed to characterize the behavior of EEG activities. First, the WT is employed for time–frequency decomposition of EEG signals. After that, the “uniform” LBP operator is carried out on the wavelet-based time–frequency representation. And the generated histogram is regarded as EEG feature vector for the quantification of the textural information of its wavelet coefficients. The LBP features coupled with the support vector machine (SVM) classifier can yield the satisfactory recognition accuracies of 98.88% for interictal and ictal EEG classification and 98.92% for normal, interictal and ictal EEG classification on the publicly available EEG dataset. Moreover, the numerical results on another large size EEG dataset demonstrate that the proposed method can also effectively detect seizure events from multi-channel raw EEG data. Compared with the standard LBP, the “uniform” LBP can obtain the much shorter histogram which greatly reduces the computational burden of classification and enables it to detect ictal EEG signals in real time.


Author(s):  
Raveendra K ◽  
◽  
Ravi J

Face biometric system is one of the successful applications of image processing. Person recognition using face is the challenging task since it involves identifying the 3D object from 2D object. The feature extraction plays a very important role in face recognition. Extraction of features both in spatial as well as frequency domain has more advantages than the features obtained from single domain alone. The proposed work achieves spatial domain feature extraction using Asymmetric Region Local Binary Pattern (ARLBP) and frequency domain feature extraction using Fast Discrete Curvelet Transform (FDCT). The obtained features are fused by concatenation and compared with trained set of features using different distance metrics and Support Vector Machine (SVM) classifier. The experiment is conducted for different face databases. It is shown that the proposed work yields 95.48% accuracy for FERET, 92.18% for L-space k, 76.55% for JAFFE and 81.44% for NIR database using SVM classifier. The results show that the proposed system provides better recognition rate for SVM classifier when compare to the other distance matrices. Further, the work is also compared with existing work for performance evaluation.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Zhi Yao ◽  
Hailing Sun ◽  
Guofu Zhou

Facial video big sensor data (BSD) is the core data of wireless sensor network industry application and technology research. It plays an important role in many industries, such as urban safety management, unmanned driving, senseless attendance, and venue management. The construction of video big sensor data security application and intelligent algorithm model has become a hot and difficult topic in related fields based on facial expression recognition. This paper focused on the experimental analysis of Cohn–Kanade dataset plus (CK+) dataset with frontal pose and great clarity. Firstly, face alignment and the selection of peak image were utilized to preprocess the expression sequence. Then, the output vector from convolution network 1 and β-VAE were connected proportionally and input to support vector machine (SVM) classifier to complete facial expression recognition. The testing accuracy of the proposed model in CK + dataset can reach 99.615%. The number of expression sequences involved in training was 2417, and the number of expression sequences in testing was 519.


2018 ◽  
Vol 6 (3) ◽  
pp. 134-146
Author(s):  
Daniil Igorevich Mikhalchenko ◽  
Arseniy Ivin ◽  
Dmitrii Malov

Purpose Single image depth prediction allows to extract depth information from a usual 2D image without usage of special sensors such as laser sensors, stereo cameras, etc. The purpose of this paper is to solve the problem of obtaining depth information from 2D image by applying deep neural networks (DNNs). Design/methodology/approach Several experiments and topologies are presented: DNN that uses three inputs—sequence of 2D images from videostream and DNN that uses only one input. However, there is no data set, that contains videostream and corresponding depth maps for every frame. So technique of creating data sets using the Blender software is presented in this work. Findings Despite the problem of an insufficient amount of available data sets, the problem of overfitting was encountered. Although created models work on the data sets, they are still overfitted and cannot predict correct depth map for the random images, that were included into the data sets. Originality/value Existing techniques of depth images creation are tested, using DNN.


Author(s):  
Zhizhi Guo ◽  
Qianxiang Zhou ◽  
Zhongqi Liu ◽  
Chunhui Liu

Locating the center of the pupils is the most important foundation and the core component of gaze tracking. The accuracy of gaze tracking largely depends on the quality of images, but additional constraints and large amount of calculation make gaze tracking impractical on high-resolution images. Although some eye-gaze trackers can get accurate result, improving the accuracy of pupil feature on low-resolution images and accurately recognizing closed eye images are still common tasks in the field of gaze estimation. Our aim is to get the accurate localization of pupil center on low-resolution image. To this aim, we proposed a simple but effective method which can accurately locate pupil center in real time. The method first gets initial eye center based on improved scale-invariant feature transform (SIFT) descriptor and support vector machine (SVM) classifier, and then gets final position of the pupil center through a size variable correction rectangular block. In this paper, comparing with the reported state-of-the-art methods,the experimental results demonstrate that our system can achieve a more accurate result on low-resolution images. On top of that, our approach shows robustness on closed eye images while some other methods would not recognize the closed eye images.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Haixia Yang ◽  
Zhaohui Ji ◽  
Jun Sun ◽  
Fanan Xing ◽  
Yixian Shen ◽  
...  

Human gestures have been considered as one of the important human-computer interaction modes. With the fast development of wireless technology in urban Internet of Things (IoT) environment, Wi-Fi can not only provide the function of high-speed network communication but also has great development potential in the field of environmental perception. This paper proposes a gesture recognition system based on the channel state information (CSI) within the physical layer of Wi-Fi transmission. To solve the problems of noise interference and phase offset in the CSI, we adopt a model based on CSI quotient. Then, the amplitude and phase curves of CSI are smoothed using Savitzky-Golay filter, and the one-dimensional convolutional neural network (1D-CNN) is used to extract the gesture features. Then, the support vector machine (SVM) classifier is adopted to recognize the gestures. The experimental results have shown that our system can achieve a recognition rate of about 90% for three common gestures, including pushing forward, left stroke, and waving. Meanwhile, the effects of different human orientation and model parameters on the recognition results are analyzed as well.


Symmetry ◽  
2019 ◽  
Vol 11 (4) ◽  
pp. 540 ◽  
Author(s):  
Qiang Guo ◽  
Xin Yu ◽  
Guoqing Ruan

Low Probability of Intercept (LPI) radar waveform recognition is not only an important branch of the electronic reconnaissance field, but also an important means to obtain non-cooperative radar information. To solve the problems of LPI radar waveform recognition rate, difficult feature extraction and large number of samples needed, an automatic classification and recognition system based on Choi-Williams distribution (CWD) and depth convolution neural network migration learning is proposed in this paper. First, the system performs CWD time-frequency transform on the LPI radar waveform to obtain a 2-D time-frequency image. Then the system preprocesses the original time-frequency image. In addition, then the system sends the pre-processed image to the pre-training model (Inception-v3 or ResNet-152) of the deep convolution network for feature extraction. Finally, the extracted features are sent to a Support Vector Machine (SVM) classifier to realize offline training and online recognition of radar waveforms. The simulation results show that the overall recognition rate of the eight LPI radar signals (LFM, BPSK, Costas, Frank, and T1–T4) of the ResNet-152-SVM system reaches 97.8%, and the overall recognition rate of the Inception-v3-SVM system reaches 96.2% when the SNR is −2 dB.


Sign in / Sign up

Export Citation Format

Share Document