An audio-visual corpus for speech perception and automatic speech recognition

2006 ◽  
Vol 120 (5) ◽  
pp. 2421-2424 ◽  
Author(s):  
Martin Cooke ◽  
Jon Barker ◽  
Stuart Cunningham ◽  
Xu Shao
2017 ◽  
Author(s):  
Thomas Schatz ◽  
Francis Bach ◽  
Emmanuel Dupoux

We test the potential of standard Automatic Speech Recognition (ASR) systems trained on large corpora of continuous speech as quantitative models of human speech processing. In human adults, speech perception is attuned to efficiently process native speech sounds, at the expense of difficulties in pro- cessing non-native sounds. We use ABX-discriminability measures to test whether ASR models can account for the patterns of confusion between speech sounds observed in humans. We show that ASR models reproduce some well-documented effects in non-native phonetic perception. Beyond the immediate results, our methodology opens up the possibility of a more systematic investigation of phonetic category perception in humans.


Sign in / Sign up

Export Citation Format

Share Document