Relating dynamic brain states to dynamic machine states: human and machine solutions to the speech recognition problem

Mapping Intimacies ◽

10.1101/074799 ◽

2016 ◽

Author(s):

Cai Wingfield ◽

Li Su ◽

Xunying Liu ◽

Chao Zhang ◽

Phil Woodland ◽

...

Keyword(s):

Speech Recognition ◽

Human Brain ◽

Automatic Speech Recognition ◽

Temporal Cortex ◽

Human Cognition ◽

Speech Comprehension ◽

Human Speech ◽

Internal States ◽

Brain States ◽

Phonetic Features

AbstractThere is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR) systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental 'machine states', generated as the ASR analysis progresses over time, to the incremental 'brain states', measured using combined electro-and magneto-encephalography (EMEG), generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain.Author SummaryThe ability to understand spoken language is a defining human capacity. But despite decades of research, there is still no well-specified account of how sound entering the ear is neurally interpreted as a sequence of meaningful words. At the same time, modern computer-based Automatic Speech Recognition (ASR) systems are capable of nearhuman levels of performance, especially where word-identification is concerned. In this research we aim to bridge the gap between human and machine solutions to speech recognition. We use a novel combination of neuroimaging and statistical methods to relate human and machine internal states that are dynamically generated as spoken words are heard by human listeners and analysed by ASR systems. We find that the stable regularities discovered by the ASR process, linking speech input to phonetic labels, can be significantly related to the regularities extracted in the human brain. Both systems may have in common a representation of these regularities in terms of articulatory phonetic features, consistent with an analysis process which recovers the articulatory gestures that generated the speech. These results suggest a possible partnership between human-and machine-based research which may deliver both a better understanding of how the human brain provides such a robust solution to speech understanding, and generate insights that enhance the performance of future ASR systems.

Download Full-text

Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L)

The Journal of the Acoustical Society of America ◽

10.1121/1.1624065 ◽

2003 ◽

Vol 114 (6) ◽

pp. 3032-3035 ◽

Cited By ~ 9

Author(s):

Odette Scharenborg ◽

Louis ten Bosch ◽

Lou Boves ◽

Dennis Norris

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Human Speech ◽

End To End

Download Full-text

Phonetic Features enhancement for Bangla automatic speech recognition

2015 International Conference on Computer and Information Engineering (ICCIE) ◽

10.1109/ccie.2015.7399309 ◽

2015 ◽

Cited By ~ 1

Author(s):

Sharif M Rasel Kabir ◽

Foyzul Hassan ◽

Foysal Ahamed ◽

Khondokar Mamun ◽

Mohammad Nurul Huda ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Phonetic Features

Download Full-text

On integrating insights from human speech perception into automatic speech recognition

10.21437/interspeech.2005-475 ◽

2005 ◽

Author(s):

Sorin Dusan ◽

Larry R. Rabiner

Keyword(s):

Speech Recognition ◽

Speech Perception ◽

Automatic Speech Recognition ◽

Human Speech

Download Full-text

The Influence of Age, Hearing, and Working Memory on the Speech Comprehension Benefit Derived from an Automatic Speech Recognition System

Ear and Hearing ◽

10.1097/aud.0b013e3181987063 ◽

2009 ◽

Vol 30 (2) ◽

pp. 262-272 ◽

Cited By ~ 12

Author(s):

Adriana A. Zekveld ◽

Sophia E. Kramer ◽

Judith M. Kessens ◽

Marcel S. M. G. Vlaming ◽

Tammo Houtgast

Keyword(s):

Working Memory ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition System ◽

Speech Comprehension ◽

Speech Recognition System ◽

Automatic Speech Recognition System

Download Full-text

System And Method For Automatic Speech Recognition From Phonetic Features And Acoustic Landmarks

The Journal of the Acoustical Society of America ◽

10.1121/1.3615773 ◽

2011 ◽

Vol 130 (1) ◽

pp. 639

Author(s):

Carol Espy-Wilson ◽

Amit Juneja

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Phonetic Features

Download Full-text

Human Speech Perception: Some Lessons from Automatic Speech Recognition

Text, Speech and Dialogue - Lecture Notes in Computer Science ◽

10.1007/3-540-44805-5_24 ◽

2001 ◽

pp. 187-196 ◽

Cited By ~ 3

Author(s):

Hynek Heřmanský

Keyword(s):

Speech Recognition ◽

Speech Perception ◽

Automatic Speech Recognition ◽

Human Speech

Download Full-text

Issues Relating to Automatic Speech Recognition and Spatial Auditory Displays in High Noise, Stressful Tank Environments

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/154193120004402267 ◽

2000 ◽

Vol 44 (22) ◽

pp. 754-757 ◽

Cited By ~ 2

Author(s):

Ellen Haas ◽

Robert Shankle ◽

Hattie Murray ◽

Deborah Travers ◽

Ted Wheeler

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Radio Channel ◽

Spatial Audio ◽

High Noise ◽

Channel Changes ◽

Human Speech ◽

Independent Variables ◽

Speech Output ◽

Information Input

The purpose of this study was to determine the extent to which automatic speech recognition (ASR) systems and human speech output to ASR systems are affected when users perform complex tasks in a noisy, stressful simulated tank environment. A second objective was to determine whether the type of auditory information input (monaural or spatial) affected user task performance, including the output of speech to ASR systems. Subjects performed simultaneous radio communication and tank driving tasks. Independent variables included auditory display configuration, driving speed, and subject type (soldier or civilian). Dependent variables included 1) the proportion of radio channel changes correctly recognized by the ASR; 2) the proportion and types of speech errors in radio channel changes; 3) the number of accurate responses to the radio messages; and 4) subject ratings of stress and workload. Subjects were six male U.S. Army personnel and six male civilians. Results indicated that ASR and human speech output were not significantly affected by any of the independent variables. A mean of 95.5% of all subject verbal commands was recognized by the ASR system. Spatialized radio messages significantly enhanced the ability of the user to understand speech communications from multiple channels. However, spatial audio information input had no significant effect on user speech output or on ASR performance. In general, ASR and spatial audio displays have potential as technologies of interest in high noise, stressful tank environments in which complex, concurrent tasks are performed.

Download Full-text

Integration of an industrial robot with the systems for image and voice recognition

Serbian Journal of Electrical Engineering ◽

10.2298/sjee1301219t ◽

2013 ◽

Vol 10 (1) ◽

pp. 219-230 ◽

Cited By ~ 8

Author(s):

Jovica Tasevski ◽

Milutin Nikolic ◽

Dragisa Miskovic

Keyword(s):

Computer Vision ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Vision System ◽

Industrial Robot ◽

Voice Recognition ◽

Computer Vision System ◽

Human Speech ◽

Position And Orientation ◽

Asr System

The paper reports a solution for the integration of the industrial robot ABB IRB140 with the system for automatic speech recognition (ASR) and the system for computer vision. The robot has the task to manipulate the objects placed randomly on a pad lying on a table, and the computer vision system has to recognize their characteristics (shape, dimension, color, position, and orientation). The ASR system has a task to recognize human speech and use it as a command to the robot, so the robot can manipulate the objects.

Download Full-text

A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition

The Journal of the Acoustical Society of America ◽

10.1121/1.2823754 ◽

2008 ◽

Vol 123 (2) ◽

pp. 1154-1168 ◽

Cited By ~ 25

Author(s):

Amit Juneja ◽

Carol Espy-Wilson

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Probabilistic Framework ◽

Landmark Detection ◽

Phonetic Features

Download Full-text

Submodular data selection with acoustic and phonetic features for automatic speech recognition

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2015.7178848 ◽

2015 ◽

Cited By ~ 6

Author(s):

Chongjia Ni ◽

Lei Wang ◽

Haibo Liu ◽

Cheung-Chi Leung ◽

Li Lu ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Data Selection ◽

Phonetic Features

Download Full-text