scholarly journals Automatic Lip Reading System Based on a Fusion Lightweight Neural Network with Raspberry Pi

2019 ◽  
Vol 9 (24) ◽  
pp. 5432
Author(s):  
Jing Wen ◽  
Yuanyao Lu

Virtual Reality (VR) is a kind of interactive experience technology. Human vision, hearing, expression, voice and even touch can be added to the interaction between humans and machine. Lip reading recognition is a new technology in the field of human-computer interaction, which has a broad development prospect. It is particularly important in a noisy environment and within the hearing- impaired population and is obtained by means of visual information from a video to make up for the deficiency of voice information. This information is a visual language that benefits from Augmented Reality (AR). The purpose is to establish an efficient and convenient way of communication. However, the traditional lip reading recognition system has high requirements of running speed and performance of the equipment because of its long recognition process and large number of parameters, so it is difficult to meet the requirements of practical application. In this paper, the mobile end lip-reading recognition system based on Raspberry Pi is implemented for the first time, and the recognition application has reached the latest level of our research. Our mobile lip-reading recognition system can be divided into three stages: First, we extract key frames from our own independent database, and then use a multi-task cascade convolution network (MTCNN) to correct the face, so as to improve the accuracy of lip extraction. In the second stage, we use MobileNets to extract lip image features and long short-term memory (LSTM) to extract sequence information between key frames. Finally, we compare three lip reading models: (1) The fusion model of Bi-LSTM and AlexNet. (2) A fusion model with attention mechanism. (3) The LSTM and MobileNets hybrid network model proposed by us. The results show that our model has fewer parameters and lower complexity. The accuracy of the model in the test dataset is 86.5%. Therefore, our mobile lip reading system is simpler and smaller than other PC platforms and saves computing resources and memory space.

Author(s):  
Le Hoai Bac ◽  
To Hoai Viet ◽  
Nguyen Ngoc Thao

Lip reading is an active field that receives much attention from computer scientists. Its applications take part not only in science, such as a speech recognition system, but also in social activities, such as teaching pronunciation for deaf children in order to recover their speaking ability. In this paper, we aim to solve a narrower problem, the lip tracking, which is an essential step toprovide visual lip data for the lip-reading system. Inspired by the idea of AVCSR, which has combined visual features with audio features to increase the accuracy in noisy environments, we use AdaBoost algorithm and Kalmanfilter for the face and lip detectors. Our result shows that the system can detect and track mouth motion in real time. It is a critical point that encourages more researches in the visual tracking and voice recognition fields.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yuanyao Lu ◽  
Qi Xiao ◽  
Haiyang Jiang

In recent years, deep learning has already been applied to English lip-reading. However, Chinese lip-reading starts late and lacks relevant dataset, and the recognition accuracy is not ideal. Therefore, this paper proposes a new hybrid neural network model to establish a Chinese lip-reading system. In this paper, we integrate the attention mechanism into both CNN and RNN. Specifically, we add the convolutional block attention module (CBAM) to the ResNet50 neural network, which enhances its ability to capture the small differences among the mouth patterns of similarly pronounced words in Chinese, improving the performance of feature extraction in the convolution process. We also add the time attention mechanism to the GRU neural network, which helps to extract the features among consecutive lip motion images. Considering the effects of the moments before and after on the current moment in the lip-reading process, we assign more weights to the key frames, which makes the features more representative. We further validate our model through experiments on our self-built dataset. Our experiments show that using convolutional block attention module (CBAM) in the Chinese lip-reading model can accurately recognize Chinese numbers 0–9 and some frequently used Chinese words. Compared with other lip-reading systems, our system has better performance and higher recognition accuracy.


2019 ◽  
Vol 9 (8) ◽  
pp. 1599 ◽  
Author(s):  
Yuanyao Lu ◽  
Hongbo Li

With the improvement of computer performance, virtual reality (VR) as a new way of visual operation and interaction method gives the automatic lip-reading technology based on visual features broad development prospects. In an immersive VR environment, the user’s state can be successfully captured through lip movements, thereby analyzing the user’s real-time thinking. Due to complex image processing, hard-to-train classifiers and long-term recognition processes, the traditional lip-reading recognition system is difficult to meet the requirements of practical applications. In this paper, the convolutional neural network (CNN) used to image feature extraction is combined with a recurrent neural network (RNN) based on attention mechanism for automatic lip-reading recognition. Our proposed method for automatic lip-reading recognition can be divided into three steps. Firstly, we extract keyframes from our own established independent database (English pronunciation of numbers from zero to nine by three males and three females). Then, we use the Visual Geometry Group (VGG) network to extract the lip image features. It is found that the image feature extraction results are fault-tolerant and effective. Finally, we compare two lip-reading models: (1) a fusion model with an attention mechanism and (2) a fusion model of two networks. The results show that the accuracy of the proposed model is 88.2% in the test dataset and 84.9% for the contrastive model. Therefore, our proposed method is superior to the traditional lip-reading recognition methods and the general neural networks.


2020 ◽  
Vol 67 (1) ◽  
pp. 133-141
Author(s):  
Dmitriy O. Khort ◽  
Aleksei I. Kutyrev ◽  
Igor G. Smirnov ◽  
Rostislav A. Filippov ◽  
Roman V. Vershinin

Technological capabilities of agricultural units cannot be optimally used without extensive automation of production processes and the use of advanced computer control systems. (Research purpose) To develop an algorithm for recognizing the coordinates of the location and ripeness of garden strawberries in different lighting conditions and describe the technological process of its harvesting in field conditions using a robotic actuator mounted on a self-propelled platform. (Materials and methods) The authors have developed a self-propelled platform with an automatic actuator for harvesting garden strawberry, which includes an actuator with six degrees of freedom, a co-axial gripper, mg966r servos, a PCA9685 controller, a Logitech HD C270 computer vision camera, a single-board Raspberry Pi 3 Model B+ computer, VL53L0X laser sensors, a SZBK07 300W voltage regulator, a Hubsan X4 Pro H109S Li-polymer battery. (Results and discussion) Using the Python programming language 3.7.2, the authors have developed a control algorithm for the automatic actuator, including operations to determine the X and Y coordinates of berries, their degree of maturity, as well as to calculate the distance to berries. It has been found that the effectiveness of detecting berries, their area and boundaries with a camera and the OpenCV library at the illumination of 300 Lux reaches 94.6 percent’s. With an increase in the robotic platform speed to 1.5 kilometre per hour and at the illumination of 300 Lux, the average area of the recognized berries decreased by 9 percent’s to 95.1 square centimeter, at the illumination of 200 Lux, the area of recognized berries decreased by 17.8 percent’s to 88 square centimeter, and at the illumination of 100 Lux, the area of recognized berries decreased by 36.4 percent’s to 76 square centimeter as compared to the real area of berries. (Conclusions) The authors have provided rationale for the technological process and developed an algorithm for harvesting garden strawberry using a robotic actuator mounted on a self-propelled platform. It has been proved that lighting conditions have a significant impact on the determination of the area, boundaries and ripeness of berries using a computer vision camera.


2021 ◽  
Vol 11 (22) ◽  
pp. 10540
Author(s):  
Navjot Rathour ◽  
Zeba Khanam ◽  
Anita Gehlot ◽  
Rajesh Singh ◽  
Mamoon Rashid ◽  
...  

There is a significant interest in facial emotion recognition in the fields of human–computer interaction and social sciences. With the advancements in artificial intelligence (AI), the field of human behavioral prediction and analysis, especially human emotion, has evolved significantly. The most standard methods of emotion recognition are currently being used in models deployed in remote servers. We believe the reduction in the distance between the input device and the server model can lead us to better efficiency and effectiveness in real life applications. For the same purpose, computational methodologies such as edge computing can be beneficial. It can also encourage time-critical applications that can be implemented in sensitive fields. In this study, we propose a Raspberry-Pi based standalone edge device that can detect real-time facial emotions. Although this edge device can be used in variety of applications where human facial emotions play an important role, this article is mainly crafted using a dataset of employees working in organizations. A Raspberry-Pi-based standalone edge device has been implemented using the Mini-Xception Deep Network because of its computational efficiency in a shorter time compared to other networks. This device has achieved 100% accuracy for detecting faces in real time with 68% accuracy, i.e., higher than the accuracy mentioned in the state-of-the-art with the FER 2013 dataset. Future work will implement a deep network on Raspberry-Pi with an Intel Movidious neural compute stick to reduce the processing time and achieve quick real time implementation of the facial emotion recognition system.


2018 ◽  
Vol 37 (2) ◽  
pp. 159 ◽  
Author(s):  
Fatemeh Vakhshiteh ◽  
Farshad Almasganj ◽  
Ahmad Nickabadi

Lip-reading is typically known as visually interpreting the speaker's lip movements during speaking. Experiments over many years have revealed that speech intelligibility increases if visual facial information becomes available. This effect becomes more apparent in noisy environments. Taking steps toward automating this process, some challenges will be raised such as coarticulation phenomenon, visual units' type, features diversity and their inter-speaker dependency. While efforts have been made to overcome these challenges, presentation of a flawless lip-reading system is still under the investigations. This paper searches for a lipreading model with an efficiently developed incorporation and arrangement of processing blocks to extract highly discriminative visual features. Here, application of a properly structured Deep Belief Network (DBN)- based recognizer is highlighted. Multi-speaker (MS) and speaker-independent (SI) tasks are performed over CUAVE database, and phone recognition rates (PRRs) of 77.65% and 73.40% are achieved, respectively. The best word recognition rates (WRRs) achieved in the tasks of MS and SI are 80.25% and 76.91%, respectively. Resulted accuracies demonstrate that the proposed method outperforms the conventional Hidden Markov Model (HMM) and competes well with the state-of-the-art visual speech recognition works.


2020 ◽  
Vol 9 (1) ◽  
pp. 1022-1027

Driving a vehicle or a car has become tedious job nowadays due to heavy traffic so focus on driving is utmost important. This makes a scope for automation in Automobiles in minimizing human intervention in controlling the dashboard functions such as Headlamps, Indicators, Power window, Wiper System, and to make it possible this is a small effort from this paper to make driving distraction free using Voice controlled dashboard. and system proposed in this paper works on speech commands from the user (Driver or Passenger). As Speech Recognition system acts Human machine Interface (HMI) in this system hence this system makes use of Speaker recognition and Speech recognition for recognizing the command and recognize whether the command is coming from authenticated user(Driver or Passenger). System performs Feature Extraction and extracts speech features such Mel Frequency Cepstral Coefficients(MFCC),Power Spectral Density(PSD),Pitch, Spectrogram. Then further for Feature matching system uses Vector Quantization Linde Buzo Gray(VQLBG) algorithm. This algorithm makes use of Euclidean distance for calculating the distance between test feature and codebook feature. Then based on speech command recognized controller (Raspberry Pi-3b) activates the device driver for motor, Solenoid valve depending on function. This system is mainly aimed to work in low noise environment as most speech recognition systems suffer when noise is introduced. When it comes to speech recognition acoustics of the room matters a lot as recognition rate differs depending on acoustics. when several testing and simulation trials were taken for testing, system has speech recognition rate of 76.13%. This system encourages Automation of vehicle dashboard and hence making driving Distraction Free.


2010 ◽  
Vol 17B (3) ◽  
pp. 227-232
Author(s):  
Young-Un Kim ◽  
Sun-Kyung Kang ◽  
Sung-Tae Jung

Sign in / Sign up

Export Citation Format

Share Document