Non-Touch Sign Word Recognition Based on Dynamic Hand Gesture Using Hybrid Segmentation and CNN Feature Fusion

Md Abdur Rahim; Md Rashedul Islam; Jungpil Shin

doi:10.3390/app9183790

Non-Touch Sign Word Recognition Based on Dynamic Hand Gesture Using Hybrid Segmentation and CNN Feature Fusion

Applied Sciences ◽

10.3390/app9183790 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3790 ◽

Cited By ~ 5

Author(s):

Md Abdur Rahim ◽

Md Rashedul Islam ◽

Jungpil Shin

Keyword(s):

Word Recognition ◽

Feature Fusion ◽

Recognition Rate ◽

Hand Gesture ◽

Svm Classifier ◽

Deaf Community ◽

Language Recognition ◽

Sign Language Recognition ◽

Segmented Images ◽

Hybrid Segmentation

Hand gesture-based sign language recognition is a prosperous application of human– computer interaction (HCI), where the deaf community, hard of hearing, and deaf family members communicate with the help of a computer device. To help the deaf community, this paper presents a non-touch sign word recognition system that translates the gesture of a sign word into text. However, the uncontrolled environment, perspective light diversity, and partial occlusion may greatly affect the reliability of hand gesture recognition. From this point of view, a hybrid segmentation technique including YCbCr and SkinMask segmentation is developed to identify the hand and extract the feature using the feature fusion of the convolutional neural network (CNN). YCbCr performs image conversion, binarization, erosion, and eventually filling the hole to obtain the segmented images. SkinMask images are obtained by matching the color of the hand. Finally, a multiclass SVM classifier is used to classify the hand gestures of a sign word. As a result, the sign of twenty common words is evaluated in real time, and the test results confirm that this system can not only obtain better-segmented images but also has a higher recognition rate than the conventional ones.

Download Full-text

Transformation Invariant Real-time Recognition of Indian Sign Language using Feature Fusion

INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING ◽

10.47164/ijngc.v12i3.633 ◽

2021 ◽

Vol 12 (3) ◽

Author(s):

Pradip Ramanbhai Patel ◽

Narendra Patel

Keyword(s):

Real Time ◽

Sign Language ◽

Feature Vector ◽

Recognition Rate ◽

Support Vector ◽

Svm Classifier ◽

Language Recognition ◽

Sign Language Recognition ◽

Hu Moments ◽

Indian Sign Language

Sign Language Recognition (SLR) is immerging as current area of research in the field of machine learning. SLR system recognizes gestures of sign language and converts them into text/voice thus making the communication possible between deaf and ordinary people. Acceptable performance of such system demands invariance of the output with respect to certain transformations of the input. In this paper, we introduce the real time hand gesture recognition system for Indian Sign Language (ISL). In order to obtain very high recognition accuracy, we propose a hybrid feature vector by combining shape oriented features like Fourier Descriptors and region oriented features like Hu Moments & Zernike Moments. Support Vector Machine (SVM) classifier is trained using feature vectors of images of training dataset. During experiment it is found that the proposed hybrid feature vector enhanced the performance of the system by compactly representing the fundamentals of invariance with respect transformation like scaling, translation and rotation. Being invariant with respect to transformation, system is easy to use and achieved a recognition rate of 95.79%.

Download Full-text

Indian Sign Language Recognition on PYNQ Board

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200909110140 ◽

2020 ◽

Vol 13 ◽

Author(s):

Sukhendra Singh ◽

G. N. Rathna ◽

Vivek Singhal

Keyword(s):

Sign Language ◽

Hand Gesture ◽

Language Recognition ◽

Sign Language Recognition ◽

Hand Gestures ◽

Depth Images ◽

Kinect Camera ◽

Impaired People ◽

Web Camera ◽

Indian Sign Language

Introduction: Sign language is the only way to communicate for speech-impaired people. But this sign language is not known to normal people so this is the cause of barrier in communicating. This is the problem faced by speech impaired people. In this paper, we have presented our solution which captured hand gestures with Kinect camera and classified the hand gesture into its correct symbol. Method: We used Kinect camera not the ordinary web camera because the ordinary camera does not capture its 3d orientation or depth of an image from camera however Kinect camera can capture 3d image and this will make classification more accurate. Result: Kinect camera will produce a different image for hand gestures for ‘2’ and ‘V’ and similarly for ‘1’ and ‘I’ however, normal web camera will not be able to distinguish between these two. We used hand gesture for Indian sign language and our dataset had 46339, RGB images and 46339 depth images. 80% of the total images were used for training and the remaining 20% for testing. In total 36 hand gestures were considered to capture alphabets and alphabets from A-Z and 10 for numeric, 26 for digits from 0-9 were considered to capture alphabets and Keywords. Conclusion: Along with real-time implementation, we have also shown the comparison of the performance of the various machine learning models in which we have found out the accuracy of CNN on depth- images has given the most accurate performance than other models. All these resulted were obtained on PYNQ Z2 board.

Download Full-text

Pose Invariant Hand Gesture Recognition using Two Stream Transfer Learning Architecture

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9058.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1771-1777

Keyword(s):

Machine Learning ◽

Transfer Learning ◽

Gesture Recognition ◽

Classification Accuracy ◽

Decision Fusion ◽

Hand Gesture Recognition ◽

Machine Learning Techniques ◽

Hand Gesture ◽

Language Recognition ◽

Sign Language Recognition

The hand gesture detection problem is one of the most prominent problems in machine learning and computer vision applications. Many machine learning techniques have been employed to solve the hand gesture recognition. These techniques find applications in sign language recognition, virtual reality, human machine interaction, autonomous vehicles, driver assistive systems etc. In this paper, the goal is to design a system to correctly identify hand gestures from a dataset of hundreds of hand gesture images. In order to incorporate this, decision fusion based system using the transfer learning architectures is proposed to achieve the said task. Two pretrained models namely ‘MobileNet’ and ‘Inception V3’ are used for this purpose. To find the region of interest (ROI) in the image, YOLO (You Only Look Once) architecture is used which also decides the type of model. Edge map images and the spatial images are trained using two separate versions of the MobileNet based transfer learning architecture and then the final probabilities are combined to decide upon the hand sign of the image. The simulation results using classification accuracy indicate the superiority of the approach of this paper against the already researched approaches using different quantitative techniques such as classification accuracy.

Download Full-text

Improving arm segmentation in sign language recognition systems using image processing

Technology and Health Care ◽

10.3233/thc-192000 ◽

2020 ◽

pp. 1-14

Author(s):

Qiuhong Tian ◽

Jiaxin Bao ◽

Huimin Yang ◽

Yingrou Chen ◽

Qiaoli Zhuang

Keyword(s):

Image Processing ◽

Sign Language ◽

Recognition Rate ◽

Support Vector ◽

Language Recognition ◽

Segmentation Method ◽

Sign Language Recognition ◽

Skin Segmentation ◽

Reconstruction Methods ◽

Area Operator

BACKGROUND: For a traditional vision-based static sign language recognition (SLR) system, arm segmentation is a major factor restricting the accuracy of SLR. OBJECTIVE: To achieve accurate arm segmentation for different bent arm shapes, we designed a segmentation method for a static SLR system based on image processing and combined it with morphological reconstruction. METHODS: First, skin segmentation was performed using YCbCr color space to extract the skin-like region from a complex background. Then, the area operator and the location of the mass center were used to remove skin-like regions and obtain the valid hand-arm region. Subsequently, the transverse distance was calculated to distinguish different bent arm shapes. The proposed segmentation method then extracted the hand region from different types of hand-arm images. Finally, the geometric features of the spatial domain were extracted and the sign language image was identified using a support vector machine (SVM) model. Experiments were conducted to determine the feasibility of the method and compare its performance with that of neural network and Euclidean distance matching methods. RESULTS: The results demonstrate that the proposed method can effectively segment skin-like regions from complex backgrounds as well as different bent arm shapes, thereby improving the recognition rate of the SLR system.

Download Full-text

Analysis of surface electromyography for hand gesure classification

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i3.pp1366-1373 ◽

2019 ◽

Vol 15 (3) ◽

pp. 1366

Author(s):

Najla Ilyana A.M ◽

Nor Aini Z ◽

Sharvin R. ◽

Norlaili M.S

Keyword(s):

Hand Movement ◽

Regression Tree ◽

Smart Phone ◽

Principal Component ◽

The Body ◽

Hand Gesture ◽

Language Recognition ◽

Sign Language Recognition ◽

Daily Lives ◽

Emg Signal

<span lang="EN-MY">Electromyography (EMG) is the measure of electrical activity produced by skeletal muscle. It is useful in prosthetic and rehabilitation technology as well as ability to handle electronic devices and robotics. If the EMG signal from the body especially hand movement can be apprehended, better value for people all around the world can be provided. Furthermore, it can be used to control smart-phone and be integrated with wearable technology. Another interesting application of this technology is in sign language recognition which is able to assist many disabled people in their daily lives. In this paper, hand gesture signals are acquired, extracted, analysed and classified. The EMG data from hand gesture which are rock, paper and scissors managed to be extracted. We use time domain feature to classified using Principal Component Analysis and regression tree. The result was highly accurate with 72.59% and 80.85% for PCA and regression tree respectively.</span>

Download Full-text

Hand Gesture Recognition Using PCA, KNN and SVM

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952100 ◽

2019 ◽

pp. 547-550

Author(s):

Julakanti Likhitha Reddy ◽

Bhavya Mallela ◽

Lakshmi Lavanya Bannaravuri ◽

Kotha Mohan Krishna

Keyword(s):

Gesture Recognition ◽

Skin Color ◽

Template Matching ◽

Hand Gesture Recognition ◽

Support Vector ◽

Hand Gesture ◽

Color Model ◽

Language Recognition ◽

Sign Language Recognition ◽

Skin Color Model

To interact with world using expressions or body movements is comparatively effective than just speaking. Gesture recognition can be a better way to convey meaningful information. Communication through gestures has been widely used by humans to express their thoughts and feelings. Gestures can be performed with any body part like head, face, hands and arms but most predominantly hand is use to perform gestures, Hand Gesture Recognition have been widely accepted for numerous applications such as human computer interactions, robotics, sign language recognition, etc. This paper focuses on bare hand gesture recognition system by proposing a scheme using a database-driven hand gesture recognition based upon skin color model approach and thresholding approach along with an effective template matching with can be effectively used for human robotics applications and similar other applications .Initially, hand region is segmented by applying skin color model in YCbCr color space. Y represents the luminance and Cb and Cr represents chrominance. In the next stage Otsu thresholding is applied to separate foreground and background. Finally, template based matching technique is developed using Principal Component Analysis (PCA), k-nearest neighbour (KNN) and Support Vector Machine (SVM) for recognition. KNN is used for statistical estimation and pattern recognition. SVM can be used for classification or regression problems.

Download Full-text

A comparison of convolutional neural networks for Kazakh sign language recognition

Eastern-European Journal of Enterprise Technologies ◽

10.15587/1729-4061.2021.241535 ◽

2021 ◽

Vol 5 (2 (113)) ◽

pp. 44-54

Author(s):

Chingiz Kenshimov ◽

Samat Mukhanov ◽

Timur Merembayev ◽

Didar Yedilkhan

Keyword(s):

Neural Networks ◽

Sign Language ◽

Convolutional Neural Networks ◽

Gesture Recognition ◽

State Of The Art ◽

Hand Gesture ◽

Language Recognition ◽

Sign Language Recognition ◽

Important Means ◽

Complex Relationships

For people with disabilities, sign language is the most important means of communication. Therefore, more and more authors of various papers and scientists around the world are proposing solutions to use intelligent hand gesture recognition systems. Such a system is aimed not only for those who wish to understand a sign language, but also speak using gesture recognition software. In this paper, a new benchmark dataset for Kazakh fingerspelling, able to train deep neural networks, is introduced. The dataset contains more than 10122 gesture samples for 42 alphabets. The alphabet has its own peculiarities as some characters are shown in motion, which may influence sign recognition. Research and analysis of convolutional neural networks, comparison, testing, results and analysis of LeNet, AlexNet, ResNet and EffectiveNet – EfficientNetB7 methods are described in the paper. EffectiveNet architecture is state-of-the-art (SOTA) and is supposed to be a new one compared to other architectures under consideration. On this dataset, we showed that the LeNet and EffectiveNet networks outperform other competing algorithms. Moreover, EffectiveNet can achieve state-of-the-art performance on nother hand gesture datasets. The architecture and operation principle of these algorithms reflect the effectiveness of their application in sign language recognition. The evaluation of the CNN model score is conducted by using the accuracy and penalty matrix. During training epochs, LeNet and EffectiveNet showed better results: accuracy and loss function had similar and close trends. The results of EffectiveNet were explained by the tools of the SHapley Additive exPlanations (SHAP) framework. SHAP explored the model to detect complex relationships between features in the images. Focusing on the SHAP tool may help to further improve the accuracy of the model

Download Full-text

Sign language recognition based on skeleton with K3D-LSTMresidual networks

10.21203/rs.3.rs-706667/v1 ◽

2021 ◽

Author(s):

Qing Han ◽

Zhanlu Huangfu ◽

Weidong Min ◽

Yanqiu Liao

Keyword(s):

Sign Language ◽

Video Sequence ◽

Short Term Memory ◽

Recurrent Network ◽

Hand Gesture ◽

Language Recognition ◽

Sign Language Recognition ◽

Dynamic Hand Gesture Recognition ◽

Long Short Term Memory

Abstract Most existing deep learning-based dynamic sign language recognition methods directly use either the video sequence based on RGB information, or whole sequences instead of only the video sequence that represents the change of gesture. These characteristics lead to inaccurate extraction of hand gesture features and failure to achieve good recognition accuracy for complex gestures. In order to solve these problems, this paper proposes a new method of dynamic hand gesture recognition for key skeleton information, which combines residual convolutional neural network and long short-term memory recurrent network, which is called KLSTM-3D residual network (K3D ResNet). In K3DResNet, the spatiotemporal complexity of network computation is reduced by extracting the representative skeleton frame of gesture change. Then, the spatiotemporal features are extracted from the skeleton keyframe sequence, and the intermediate score corresponding to each action in the video sequence is established after the feature analysis. Finally, the classification of video sequences can accurately identify sign language. Experiments were performed on datasets DHG14/28 and SHREC’17 Track. The accuracy of verification on dataset DEVISIGN D reached 88.6%. In addition, the accuracy of the combination of RGB and skeleton information reached 93.2%.

Download Full-text

Sensor Fusion of Motion-Based Sign Language Interpretation with Deep Learning

Sensors ◽

10.3390/s20216256 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6256

Author(s):

Boon Giin Lee ◽

Teak-Wei Chong ◽

Wan-Young Chung

Keyword(s):

Deep Learning ◽

American Sign Language ◽

Sign Language ◽

Sensor Fusion ◽

Recognition Rate ◽

Recognition System ◽

Hearing Impaired ◽

Language Recognition ◽

Sign Language Recognition ◽

Impaired People

Sign language was designed to allow hearing-impaired people to interact with others. Nonetheless, knowledge of sign language is uncommon in society, which leads to a communication barrier with the hearing-impaired community. Many studies of sign language recognition utilizing computer vision (CV) have been conducted worldwide to reduce such barriers. However, this approach is restricted by the visual angle and highly affected by environmental factors. In addition, CV usually involves the use of machine learning, which requires collaboration of a team of experts and utilization of high-cost hardware utilities; this increases the application cost in real-world situations. Thus, this study aims to design and implement a smart wearable American Sign Language (ASL) interpretation system using deep learning, which applies sensor fusion that “fuses” six inertial measurement units (IMUs). The IMUs are attached to all fingertips and the back of the hand to recognize sign language gestures; thus, the proposed method is not restricted by the field of view. The study reveals that this model achieves an average recognition rate of 99.81% for dynamic ASL gestures. Moreover, the proposed ASL recognition system can be further integrated with ICT and IoT technology to provide a feasible solution to assist hearing-impaired people in communicating with others and improve their quality of life.

Download Full-text

A Deep Learning-Based End-to-End Composite System for Hand Detection and Gesture Recognition

Sensors ◽

10.3390/s19235282 ◽

2019 ◽

Vol 19 (23) ◽

pp. 5282 ◽

Cited By ~ 1

Author(s):

Adam Ahmed Qaid MOHAMMED ◽

Jiancheng Lv ◽

MD. Sajjatul Islam

Keyword(s):

Deep Learning ◽

Gesture Recognition ◽

Hand Gesture Recognition ◽

Hand Gesture ◽

Language Recognition ◽

Sign Language Recognition ◽

Hand Detection ◽

Cluttered Environments ◽

Classical Dance ◽

Action Analysis

Recent research on hand detection and gesture recognition has attracted increasing interest due to its broad range of potential applications, such as human-computer interaction, sign language recognition, hand action analysis, driver hand behavior monitoring, and virtual reality. In recent years, several approaches have been proposed with the aim of developing a robust algorithm which functions in complex and cluttered environments. Although several researchers have addressed this challenging problem, a robust system is still elusive. Therefore, we propose a deep learning-based architecture to jointly detect and classify hand gestures. In the proposed architecture, the whole image is passed through a one-stage dense object detector to extract hand regions, which, in turn, pass through a lightweight convolutional neural network (CNN) for hand gesture recognition. To evaluate our approach, we conducted extensive experiments on four publicly available datasets for hand detection, including the Oxford, 5-signers, EgoHands, and Indian classical dance (ICD) datasets, along with two hand gesture datasets with different gesture vocabularies for hand gesture recognition, namely, the LaRED and TinyHands datasets. Here, experimental results demonstrate that the proposed architecture is efficient and robust. In addition, it outperforms other approaches in both the hand detection and gesture classification tasks.

Download Full-text