scholarly journals Contour mapping for speaker-independent lip reading system

Author(s):  
Souheil Fenghour ◽  
Daqing Chen ◽  
Perry Xiao
2018 ◽  
Vol 37 (2) ◽  
pp. 159 ◽  
Author(s):  
Fatemeh Vakhshiteh ◽  
Farshad Almasganj ◽  
Ahmad Nickabadi

Lip-reading is typically known as visually interpreting the speaker's lip movements during speaking. Experiments over many years have revealed that speech intelligibility increases if visual facial information becomes available. This effect becomes more apparent in noisy environments. Taking steps toward automating this process, some challenges will be raised such as coarticulation phenomenon, visual units' type, features diversity and their inter-speaker dependency. While efforts have been made to overcome these challenges, presentation of a flawless lip-reading system is still under the investigations. This paper searches for a lipreading model with an efficiently developed incorporation and arrangement of processing blocks to extract highly discriminative visual features. Here, application of a properly structured Deep Belief Network (DBN)- based recognizer is highlighted. Multi-speaker (MS) and speaker-independent (SI) tasks are performed over CUAVE database, and phone recognition rates (PRRs) of 77.65% and 73.40% are achieved, respectively. The best word recognition rates (WRRs) achieved in the tasks of MS and SI are 80.25% and 76.91%, respectively. Resulted accuracies demonstrate that the proposed method outperforms the conventional Hidden Markov Model (HMM) and competes well with the state-of-the-art visual speech recognition works.


2010 ◽  
Vol 17B (3) ◽  
pp. 227-232
Author(s):  
Young-Un Kim ◽  
Sun-Kyung Kang ◽  
Sung-Tae Jung

2011 ◽  
Vol 44 (3) ◽  
pp. 559-571 ◽  
Author(s):  
Jongju Shin ◽  
Jin Lee ◽  
Daijin Kim

2014 ◽  
Vol 1079-1080 ◽  
pp. 820-823
Author(s):  
Li Guo Zheng ◽  
Mei Li Zhu ◽  
Qing Qing Wang

This paper proposes a novel algorithm used in extraction of lip feature extraction for to improved efficiency and robustness of lip-reading system. First, Lip Gray Energy Image (LGEI) is used to smooth noise, and improve noise resistance of the system. Second, Discrete Wavelet Analysis (DWT) is used to extract salient visual speech information from lip by decorrelating spectral information. Last, lip features are obtained by downsampling data from second step, the resample can effectively reduce the amount of computation. Experimental results show the performance of this method is exceedingly discriminative, accurate and computation efficient, the precision rate can rate 96%.


Author(s):  
D. Ivanko ◽  
D. Ryumin

Abstract. Visual information plays a key role in automatic speech recognition (ASR) when audio is corrupted by background noise, or even inaccessible. Speech recognition using visual information is called lip-reading. The initial idea of visual speech recognition comes from humans’ experience: we are able to recognize spoken words from the observation of a speaker's face without or with limited access to the sound part of the voice. Based on the conducted experimental evaluations as well as on analysis of the research field we propose a novel task-oriented approach towards practical lip-reading system implementation. Its main purpose is to be some kind of a roadmap for researchers who need to build a reliable visual speech recognition system for their task. In a rough approximation, we can divide the task of lip-reading into two parts, depending on the complexity of the problem. First, if we need to recognize isolated words, numbers or small phrases (e.g. Telephone numbers with a strict grammar or keywords). Or second, if we need to recognize continuous speech (phrases or sentences). All these stages disclosed in detail in this paper. Based on the proposed approach we implemented from scratch automatic visual speech recognition systems of three different architectures: GMM-CHMM, DNN-HMM and purely End-to-end. A description of the methodology, tools, step-by-step development and all necessary parameters are disclosed in detail in current paper. It is worth noting that for the Russian speech recognition, such systems were created for the first time.


Author(s):  
Yuanyao Lu ◽  
Jie Yan ◽  
Ke Gu

As a significant component of the Human Computer Interface (HCI), automatic lip reading is designed for the purpose of understanding the content of speech by interpreting the movements of the lips. Although performance of automatic lip reading system is easily affected by challenging conditions such as noise, illumination and low resolution, enormous advancements in the relevant fields accompanied with enhancement in computer capability have improved the robustness of the system, making it more adaptable to the real environment. In this paper, we study the field and gives a detailed discussion on the actuality and the developing level of automatic lip reading. We emphatically introduce the feature extraction and recognition model algorithms. We also compare and analyze the various visual speech databases for their characteristics and functions in speech recognition systems. In addition, we describe the challenges and offer our insights into future research direction of automatic lip reading.


Sign in / Sign up

Export Citation Format

Share Document