The relationship between word error rate and perceptual judgment

2020 ◽  
Vol 148 (4) ◽  
pp. 2763-2763
Author(s):  
Seongjin Park ◽  
John Culnan
2011 ◽  
Vol 19 (5) ◽  
pp. 1103-1112 ◽  
Author(s):  
Ralf Schluter ◽  
Markus Nussbaum-Thom ◽  
Hermann Ney

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3063
Author(s):  
Aleksandr Laptev ◽  
Andrei Andrusenko ◽  
Ivan Podluzhny ◽  
Anton Mitrofanov ◽  
Ivan Medennikov ◽  
...  

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. For on-device speech recognition tasks, researchers and industry prefer end-to-end ASR systems as they can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Personalization, which is mainly handling out-of-vocabulary (OOV) words, is another challenging task associated with speech assistants. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. We propose a method of dynamic acoustic unit augmentation based on the Byte Pair Encoding with dropout (BPE-dropout) technique. The method non-deterministically tokenizes utterances to extend the token’s contexts and to regularize their distribution for the model’s recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative word error rate (WER) and 25% relative F-score) at no additional computational cost. Owing to the BPE-dropout use, our monolingual Turkish Conformer has achieved a competitive result with 22.2% character error rate (CER) and 38.9% WER, which is close to the best published multilingual system.


Author(s):  
Cosmin Munteanu ◽  
Gerald Penn ◽  
Ron Baecker ◽  
Elaine Toms ◽  
David James
Keyword(s):  

2020 ◽  
Vol 6 (12) ◽  
pp. 141
Author(s):  
Abdelrahman Abdallah ◽  
Mohamed Hamada ◽  
Daniyar Nurseitov

This article considers the task of handwritten text recognition using attention-based encoder–decoder networks trained in the Kazakh and Russian languages. We have developed a novel deep neural network model based on a fully gated CNN, supported by multiple bidirectional gated recurrent unit (BGRU) and attention mechanisms to manipulate sophisticated features that achieve 0.045 Character Error Rate (CER), 0.192 Word Error Rate (WER), and 0.253 Sequence Error Rate (SER) for the first test dataset and 0.064 CER, 0.24 WER and 0.361 SER for the second test dataset. Our proposed model is the first work to handle handwriting recognition models in Kazakh and Russian languages. Our results confirm the importance of our proposed Attention-Gated-CNN-BGRU approach for training handwriting text recognition and indicate that it can lead to statistically significant improvements (p-value < 0.05) in the sensitivity (recall) over the tests dataset. The proposed method’s performance was evaluated using handwritten text databases of three languages: English, Russian, and Kazakh. It demonstrates better results on the Handwritten Kazakh and Russian (HKR) dataset than the other well-known models.


2011 ◽  
Vol 37 (4) ◽  
pp. 657-688 ◽  
Author(s):  
Maja Popović ◽  
Hermann Ney

Evaluation and error analysis of machine translation output are important but difficult tasks. In this article, we propose a framework for automatic error analysis and classification based on the identification of actual erroneous words using the algorithms for computation of Word Error Rate (WER) and Position-independent word Error Rate (PER), which is just a very first step towards development of automatic evaluation measures that provide more specific information of certain translation problems. The proposed approach enables the use of various types of linguistic knowledge in order to classify translation errors in many different ways. This work focuses on one possible set-up, namely, on five error categories: inflectional errors, errors due to wrong word order, missing words, extra words, and incorrect lexical choices. For each of the categories, we analyze the contribution of various POS classes. We compared the results of automatic error analysis with the results of human error analysis in order to investigate two possible applications: estimating the contribution of each error type in a given translation output in order to identify the main sources of errors for a given translation system, and comparing different translation outputs using the introduced error categories in order to obtain more information about advantages and disadvantages of different systems and possibilites for improvements, as well as about advantages and disadvantages of applied methods for improvements. We used Arabic–English Newswire and Broadcast News and Chinese–English Newswire outputs created in the framework of the GALE project, several Spanish and English European Parliament outputs generated during the TC-Star project, and three German–English outputs generated in the framework of the fourth Machine Translation Workshop. We show that our results correlate very well with the results of a human error analysis, and that all our metrics except the extra words reflect well the differences between different versions of the same translation system as well as the differences between different translation systems.


2020 ◽  
Vol 22 (5) ◽  
pp. 942-957
Author(s):  
Lilun Du ◽  
Qing Li

Problem definition: Service providers often recruit a large number people over a short period of time, a practice known as high-volume recruitment. In this study, we describe a data-driven approach that can be used to streamline the recruitment process and aid decision making. The recruitment process consists of two stages: screening and interview. All candidates are evaluated in the screening stage, but only those with sufficiently high screening scores are short-listed for an interview. After the interview stage, offers are made based on the screening and interview scores. We define the error rate as the probability that a candidate who is rejected during either stage might have had a higher job performance than the median job performance of the candidates recruited had he or she been accepted. To ensure the error rate is no higher than a certain level, how many candidates should be short-listed, and, after the interview, how should candidates be ranked based on the two scores? Academic/practical relevance: High-volume recruitment is challenging because decisions have to be made for many people, under tight time constraints, and under uncertainty. Our approach does not require knowledge about the cost of evaluating candidates and the utility of selecting candidates; hence, it is easier to implement in practice. We apply the approach to the process of recruiting students for a postgraduate business program. Methodology: We use stochastic modeling and regression. Results: We provide a procedure for estimating the error rate as a function of the percentage of candidates short-listed for interviews. We show that the estimated error rate is asymptotically unbiased and converges to the true error rate in probability. We then run a linear regression analysis to estimate the relationship between job performance and the screening and interview scores. In a case study involving a postgraduate business program, the job performance measure we adopt is the grade point average in the program, observable only for the students enrolled in the program. We find that the interview score is statistically significant, but the screening score is not. Managerial implications: For the postgraduate program, our study demonstrates that the time-intensive interview process has substantial value. We should increase, rather than reduce, as suggested by the program administrators before our study, the weight assigned to the interview score and the time spent on the interview process. Knowing the relationship between the error rate and the percentage of candidates short-listed for interviews, the program administrators can determine the appropriate percentage for any given error rate deemed acceptable and improve the ranking of candidates. Our approach is general and can be applied to other high-volume recruiters.


2017 ◽  
Vol 121 (6) ◽  
pp. 1120-1130 ◽  
Author(s):  
Lauren Hill ◽  
Zane Zheng

While social media is an aspect of life for many, it brings to light the lack of interpersonal connection when browsing activity occurs. The displacement theory suggests that the quality of one’s offline interactions is affected by how much time an individual allots to those exchanges. Depending on the amount of time spent online, interpersonal connections may suffer and lead to negative psychological consequences. Our study aimed to explore the relationship between the desirability of social media and socialization preferences through a cue-based perceptual judgment task where participants ( N = 136) rated 40 gray-scale images in terms of their desirability. The image categories included social media icons, singular scenes depicting an isolated activity, social scenes representing an interactive activity, and traffic signs as the control. We also included questionnaires to assess depressiveness and aspects of social media usage. Our findings suggest that the immediate desire for social media is potentially linked to one’s desire for social isolation as represented by the singular scene category, the intensity of participant’s reported daily usage, and the extent to which social media is perceived to impact real social life. To our knowledge, this is the first report on the initial desirability judgment of social media and its association with other factors. Further research is needed to distinguish the variability in users’ aim of using social media and if that is related to one’s perceived feelings of social connectedness and solitude.


Author(s):  
Olov Engwall ◽  
José Lopes ◽  
Ronald Cumbal

AbstractThe large majority of previous work on human-robot conversations in a second language has been performed with a human wizard-of-Oz. The reasons are that automatic speech recognition of non-native conversational speech is considered to be unreliable and that the dialogue management task of selecting robot utterances that are adequate at a given turn is complex in social conversations. This study therefore investigates if robot-led conversation practice in a second language with pairs of adult learners could potentially be managed by an autonomous robot. We first investigate how correct and understandable transcriptions of second language learner utterances are when made by a state-of-the-art speech recogniser. We find both a relatively high word error rate (41%) and that a substantial share (42%) of the utterances are judged to be incomprehensible or only partially understandable by a human reader. We then evaluate how adequate the robot utterance selection is, when performed manually based on the speech recognition transcriptions or autonomously using (a) predefined sequences of robot utterances, (b) a general state-of-the-art language model that selects utterances based on learner input or the preceding robot utterance, or (c) a custom-made statistical method that is trained on observations of the wizard’s choices in previous conversations. It is shown that adequate or at least acceptable robot utterances are selected by the human wizard in most cases (96%), even though the ASR transcriptions have a high word error rate. Further, the custom-made statistical method performs as well as manual selection of robot utterances based on ASR transcriptions. It was also found that the interaction strategy that the robot employed, which differed regarding how much the robot maintained the initiative in the conversation and if the focus of the conversation was on the robot or the learners, had marginal effects on the word error rate and understandability of the transcriptions but larger effects on the adequateness of the utterance selection. Autonomous robot-led conversations may hence work better with some robot interaction strategies.


Sign in / Sign up

Export Citation Format

Share Document