The relationship between word error rate and perceptual judgment

Seongjin Park; John Culnan

doi:10.1121/1.5147687

On the Relationship Between Bayes Risk and Word Error Rate in ASR

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2010.2091635 ◽

2011 ◽

Vol 19 (5) ◽

pp. 1103-1112 ◽

Cited By ~ 6

Author(s):

Ralf Schluter ◽

Markus Nussbaum-Thom ◽

Hermann Ney

Keyword(s):

Error Rate ◽

Bayes Risk ◽

Word Error Rate ◽

The Relationship

Download Full-text

Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition

Sensors ◽

10.3390/s21093063 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3063

Author(s):

Aleksandr Laptev ◽

Andrei Andrusenko ◽

Ivan Podluzhny ◽

Anton Mitrofanov ◽

Ivan Medennikov ◽

...

Keyword(s):

Speech Recognition ◽

Error Rate ◽

Rapid Development ◽

Computational Cost ◽

Vocabulary Size ◽

Word Error Rate ◽

Low Resource ◽

Steady Improvement ◽

End To End ◽

Asr System

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. For on-device speech recognition tasks, researchers and industry prefer end-to-end ASR systems as they can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Personalization, which is mainly handling out-of-vocabulary (OOV) words, is another challenging task associated with speech assistants. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. We propose a method of dynamic acoustic unit augmentation based on the Byte Pair Encoding with dropout (BPE-dropout) technique. The method non-deterministically tokenizes utterances to extend the token’s contexts and to regularize their distribution for the model’s recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative word error rate (WER) and 25% relative F-score) at no additional computational cost. Owing to the BPE-dropout use, our monolingual Turkish Conformer has achieved a competitive result with 22.2% character error rate (CER) and 38.9% WER, which is close to the best published multilingual system.

Download Full-text

Measuring the acceptable word error rate of machine-generated webcast transcripts

10.21437/interspeech.2006-40 ◽

2006 ◽

Cited By ~ 1

Author(s):

Cosmin Munteanu ◽

Gerald Penn ◽

Ron Baecker ◽

Elaine Toms ◽

David James

Keyword(s):

Error Rate ◽

Word Error Rate

Download Full-text

Improvements to the LIUM French ASR system based on CMU sphinx: what helps to significantly reduce the word error rate?

10.21437/interspeech.2009-607 ◽

2009 ◽

Author(s):

Paul Deléglise ◽

Yannick Estève ◽

Sylvain Meignier ◽

Teva Merlin

Keyword(s):

Error Rate ◽

Word Error Rate ◽

Asr System

Download Full-text

Attention-Based Fully Gated CNN-BGRU for Russian Handwritten Text

Journal of Imaging ◽

10.3390/jimaging6120141 ◽

2020 ◽

Vol 6 (12) ◽

pp. 141

Author(s):

Abdelrahman Abdallah ◽

Mohamed Hamada ◽

Daniyar Nurseitov

Keyword(s):

Error Rate ◽

Handwriting Recognition ◽

Text Recognition ◽

P Value ◽

Word Error Rate ◽

Test Dataset ◽

Handwritten Text ◽

Proposed Model ◽

Handwritten Text Recognition ◽

Gated Recurrent Unit

This article considers the task of handwritten text recognition using attention-based encoder–decoder networks trained in the Kazakh and Russian languages. We have developed a novel deep neural network model based on a fully gated CNN, supported by multiple bidirectional gated recurrent unit (BGRU) and attention mechanisms to manipulate sophisticated features that achieve 0.045 Character Error Rate (CER), 0.192 Word Error Rate (WER), and 0.253 Sequence Error Rate (SER) for the first test dataset and 0.064 CER, 0.24 WER and 0.361 SER for the second test dataset. Our proposed model is the first work to handle handwriting recognition models in Kazakh and Russian languages. Our results confirm the importance of our proposed Attention-Gated-CNN-BGRU approach for training handwriting text recognition and indicate that it can lead to statistically significant improvements (p-value < 0.05) in the sensitivity (recall) over the tests dataset. The proposed method’s performance was evaluated using handwritten text databases of three languages: English, Russian, and Kazakh. It demonstrates better results on the Handwritten Kazakh and Russian (HKR) dataset than the other well-known models.

Download Full-text

Closed-Form Word Error Rate Analysis for Successive Interference Cancellation Decoders

IEEE Transactions on Wireless Communications ◽

10.1109/twc.2018.2875699 ◽

2018 ◽

Vol 17 (12) ◽

pp. 8256-8267 ◽

Cited By ~ 2

Author(s):

Jinming Wen ◽

Keyu Wu ◽

Chintha Tellambura ◽

Pingzhi Fan

Keyword(s):

Closed Form ◽

Error Rate ◽

Interference Cancellation ◽

Successive Interference Cancellation ◽

Word Error Rate ◽

Rate Analysis

Download Full-text

Towards Automatic Error Analysis of Machine Translation Output

Computational Linguistics ◽

10.1162/coli_a_00072 ◽

2011 ◽

Vol 37 (4) ◽

pp. 657-688 ◽

Cited By ~ 26

Author(s):

Maja Popović ◽

Hermann Ney

Keyword(s):

Error Analysis ◽

Machine Translation ◽

Error Rate ◽

Human Error ◽

Translation System ◽

Specific Information ◽

Error Type ◽

Word Error Rate ◽

Advantages And Disadvantages ◽

Automatic Error

Evaluation and error analysis of machine translation output are important but difficult tasks. In this article, we propose a framework for automatic error analysis and classification based on the identification of actual erroneous words using the algorithms for computation of Word Error Rate (WER) and Position-independent word Error Rate (PER), which is just a very first step towards development of automatic evaluation measures that provide more specific information of certain translation problems. The proposed approach enables the use of various types of linguistic knowledge in order to classify translation errors in many different ways. This work focuses on one possible set-up, namely, on five error categories: inflectional errors, errors due to wrong word order, missing words, extra words, and incorrect lexical choices. For each of the categories, we analyze the contribution of various POS classes. We compared the results of automatic error analysis with the results of human error analysis in order to investigate two possible applications: estimating the contribution of each error type in a given translation output in order to identify the main sources of errors for a given translation system, and comparing different translation outputs using the introduced error categories in order to obtain more information about advantages and disadvantages of different systems and possibilites for improvements, as well as about advantages and disadvantages of applied methods for improvements. We used Arabic–English Newswire and Broadcast News and Chinese–English Newswire outputs created in the framework of the GALE project, several Spanish and English European Parliament outputs generated during the TC-Star project, and three German–English outputs generated in the framework of the fourth Machine Translation Workshop. We show that our results correlate very well with the results of a human error analysis, and that all our metrics except the extra words reflect well the differences between different versions of the same translation system as well as the differences between different translation systems.

Download Full-text

A Data-Driven Approach to High-Volume Recruitment: Application to Student Admission

Manufacturing & Service Operations Management ◽

10.1287/msom.2019.0779 ◽

2020 ◽

Vol 22 (5) ◽

pp. 942-957

Author(s):

Lilun Du ◽

Qing Li

Keyword(s):

Job Performance ◽

Error Rate ◽

High Volume ◽

Data Driven ◽

Recruitment Process ◽

Program Administrators ◽

Interview Process ◽

Interview Score ◽

Data Driven Approach ◽

The Relationship

Problem definition: Service providers often recruit a large number people over a short period of time, a practice known as high-volume recruitment. In this study, we describe a data-driven approach that can be used to streamline the recruitment process and aid decision making. The recruitment process consists of two stages: screening and interview. All candidates are evaluated in the screening stage, but only those with sufficiently high screening scores are short-listed for an interview. After the interview stage, offers are made based on the screening and interview scores. We define the error rate as the probability that a candidate who is rejected during either stage might have had a higher job performance than the median job performance of the candidates recruited had he or she been accepted. To ensure the error rate is no higher than a certain level, how many candidates should be short-listed, and, after the interview, how should candidates be ranked based on the two scores? Academic/practical relevance: High-volume recruitment is challenging because decisions have to be made for many people, under tight time constraints, and under uncertainty. Our approach does not require knowledge about the cost of evaluating candidates and the utility of selecting candidates; hence, it is easier to implement in practice. We apply the approach to the process of recruiting students for a postgraduate business program. Methodology: We use stochastic modeling and regression. Results: We provide a procedure for estimating the error rate as a function of the percentage of candidates short-listed for interviews. We show that the estimated error rate is asymptotically unbiased and converges to the true error rate in probability. We then run a linear regression analysis to estimate the relationship between job performance and the screening and interview scores. In a case study involving a postgraduate business program, the job performance measure we adopt is the grade point average in the program, observable only for the students enrolled in the program. We find that the interview score is statistically significant, but the screening score is not. Managerial implications: For the postgraduate program, our study demonstrates that the time-intensive interview process has substantial value. We should increase, rather than reduce, as suggested by the program administrators before our study, the weight assigned to the interview score and the time spent on the interview process. Knowing the relationship between the error rate and the percentage of candidates short-listed for interviews, the program administrators can determine the appropriate percentage for any given error rate deemed acceptable and improve the ranking of candidates. Our approach is general and can be applied to other high-volume recruiters.

Download Full-text

A Desire for Social Media Is Associated With a Desire for Solitary but Not Social Activities

Psychological Reports ◽

10.1177/0033294117742657 ◽

2017 ◽

Vol 121 (6) ◽

pp. 1120-1130 ◽

Cited By ~ 1

Author(s):

Lauren Hill ◽

Zane Zheng

Keyword(s):

Social Media ◽

Social Life ◽

Social Connectedness ◽

Judgment Task ◽

Perceptual Judgment ◽

Psychological Consequences ◽

Media Usage ◽

Displacement Theory ◽

The Relationship

While social media is an aspect of life for many, it brings to light the lack of interpersonal connection when browsing activity occurs. The displacement theory suggests that the quality of one’s offline interactions is affected by how much time an individual allots to those exchanges. Depending on the amount of time spent online, interpersonal connections may suffer and lead to negative psychological consequences. Our study aimed to explore the relationship between the desirability of social media and socialization preferences through a cue-based perceptual judgment task where participants ( N = 136) rated 40 gray-scale images in terms of their desirability. The image categories included social media icons, singular scenes depicting an isolated activity, social scenes representing an interactive activity, and traffic signs as the control. We also included questionnaires to assess depressiveness and aspects of social media usage. Our findings suggest that the immediate desire for social media is potentially linked to one’s desire for social isolation as represented by the singular scene category, the intensity of participant’s reported daily usage, and the extent to which social media is perceived to impact real social life. To our knowledge, this is the first report on the initial desirability judgment of social media and its association with other factors. Further research is needed to distinguish the variability in users’ aim of using social media and if that is related to one’s perceived feelings of social connectedness and solitude.

Download Full-text

Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?

International Journal of Social Robotics ◽

10.1007/s12369-021-00849-8 ◽

2022 ◽

Author(s):

Olov Engwall ◽

José Lopes ◽

Ronald Cumbal

Keyword(s):

Second Language ◽

Speech Recognition ◽

Statistical Method ◽

Error Rate ◽

State Of The Art ◽

Autonomous Robot ◽

Language Learner ◽

Word Error Rate ◽

Wizard Of Oz ◽

Custom Made

AbstractThe large majority of previous work on human-robot conversations in a second language has been performed with a human wizard-of-Oz. The reasons are that automatic speech recognition of non-native conversational speech is considered to be unreliable and that the dialogue management task of selecting robot utterances that are adequate at a given turn is complex in social conversations. This study therefore investigates if robot-led conversation practice in a second language with pairs of adult learners could potentially be managed by an autonomous robot. We first investigate how correct and understandable transcriptions of second language learner utterances are when made by a state-of-the-art speech recogniser. We find both a relatively high word error rate (41%) and that a substantial share (42%) of the utterances are judged to be incomprehensible or only partially understandable by a human reader. We then evaluate how adequate the robot utterance selection is, when performed manually based on the speech recognition transcriptions or autonomously using (a) predefined sequences of robot utterances, (b) a general state-of-the-art language model that selects utterances based on learner input or the preceding robot utterance, or (c) a custom-made statistical method that is trained on observations of the wizard’s choices in previous conversations. It is shown that adequate or at least acceptable robot utterances are selected by the human wizard in most cases (96%), even though the ASR transcriptions have a high word error rate. Further, the custom-made statistical method performs as well as manual selection of robot utterances based on ASR transcriptions. It was also found that the interaction strategy that the robot employed, which differed regarding how much the robot maintained the initiative in the conversation and if the focus of the conversation was on the robot or the learners, had marginal effects on the word error rate and understandability of the transcriptions but larger effects on the adequateness of the utterance selection. Autonomous robot-led conversations may hence work better with some robot interaction strategies.

Download Full-text