On the Effects of Automatic Transcription and Segmentation Errors in Hungarian Spoken Language Processing

Máté Ákos Tündik; Valér Kaszás; György Szaszák

doi:10.3311/ppee.14052

On the Effects of Automatic Transcription and Segmentation Errors in Hungarian Spoken Language Processing

Periodica Polytechnica Electrical Engineering and Computer Science ◽

10.3311/ppee.14052 ◽

2019 ◽

Vol 63 (4) ◽

pp. 254-262

Author(s):

Máté Ákos Tündik ◽

Valér Kaszás ◽

György Szaszák

Keyword(s):

Language Processing ◽

Error Propagation ◽

Semantic Space ◽

Spoken Language ◽

Error Rates ◽

Human Capabilities ◽

Spoken Language Understanding ◽

Speech Summarization ◽

The Difference ◽

Closed Captions

Emerging Artificial Intelligence (AI) technology has brought machines to reach an equal or even superior level compared to human capabilities in several fields; nevertheless, among many other fields, making a computer able to understand human language still remains a challenge. When dealing with speech understanding, Automatic Speech Recognition (ASR) is used to generate transcripts, which are processed with text-based tools targeting Spoken Language Understanding (SLU). Depending on the ASR quality (which further depends on speech quality, the complexity of the topic, environment etc.), transcripts contain errors, which propagate further into the processing pipeline. Subjective tests show on the other hand, that humans understand quite well ASR-closed captions, despite the word and punctuation errors. Through word embedding based semantic parsing, the present paper is interested in quantifying the semantic bias introduced by ASR error propagation. As a special use case, speech summarization is also evaluated with regard to ASR error propagation. We show, that despite the higher word error rates seen with the highly inflectional Hungarian, the semantic space suffers least impact than the difference in Word Error Rate would suggest.

Download Full-text

In search of the unicorn: Where is the invariance in speech?

Behavioral and Brain Sciences ◽

10.1017/s0140525x98311176 ◽

1998 ◽

Vol 21 (2) ◽

pp. 267-268 ◽

Cited By ~ 1

Author(s):

Steven Greenberg

Keyword(s):

Language Processing ◽

Real World ◽

Low Frequency ◽

Spoken Language ◽

Language Understanding ◽

Modulation Spectrum ◽

Alternative Representation ◽

Spoken Language Understanding ◽

Spoken Language Processing ◽

Linear Sequence

Understanding spoken language involves far more than decoding a linear sequence of phonetic elements. In view of the inherent variability of the acoustic signal in spontaneous speech, it is not entirely clear that the sort of representation derived from locus equations is sufficient to account for the robustness of spoken language understanding under real-world conditions. An alternative representation, based on the low-frequency modulation spectrum, provides a more plausible neural foundation for spoken language processing.

Download Full-text

Spoken Language Processing: A Convergent Approach to Conceptualizing (Central) Auditory Processing

ASHA Leader ◽

10.1044/leader.ftr2.11082006.6 ◽

2006 ◽

Vol 11 (8) ◽

pp. 6-33 ◽

Cited By ~ 1

Author(s):

Larry Medwetsky

Keyword(s):

Language Processing ◽

Auditory Processing ◽

Spoken Language ◽

Central Auditory Processing ◽

Spoken Language Processing ◽

Convergent Approach

Download Full-text

Integrating Syntax and Semantics into Spoken Language Understanding

10.21236/ada460560 ◽

1991 ◽

Author(s):

Lynette Hirschman ◽

Stephanie Seneff ◽

David Goodine ◽

Michael Phillips

Keyword(s):

Spoken Language ◽

Language Understanding ◽

Spoken Language Understanding

Download Full-text

Spoken Language Understanding using Pretraining Technique

SSRN Electronic Journal ◽

10.2139/ssrn.3703913 ◽

2020 ◽

Author(s):

Saad Ghojaria ◽

Rahul Kotian ◽

Yash Sawant ◽

Suresh Mestry

Keyword(s):

Spoken Language ◽

Language Understanding ◽

Spoken Language Understanding

Download Full-text

End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding

10.21437/interspeech.2016-312 ◽

2016 ◽

Cited By ~ 27

Author(s):

Yun-Nung Chen ◽

Dilek Hakkani-Tür ◽

Gokhan Tur ◽

Jianfeng Gao ◽

Li Deng

Keyword(s):

Spoken Language ◽

Language Understanding ◽

Spoken Language Understanding ◽

End To End

Download Full-text

Cross-Lingual Multi-Task Neural Architecture for Spoken Language Understanding

10.21437/interspeech.2018-1039 ◽

2018 ◽

Author(s):

Yujiang Li ◽

Xuemin Zhao ◽

Weiqun Xu ◽

Yonghong Yan

Keyword(s):

Spoken Language ◽

Language Understanding ◽

Spoken Language Understanding ◽

Neural Architecture ◽

Cross Lingual

Download Full-text

Race and Gender

The Oxford Handbook of Ethics of AI ◽

10.1093/oxfordhb/9780190067397.013.16 ◽

2020 ◽

pp. 251-269 ◽

Cited By ~ 2

Author(s):

Timnit Gebru

Keyword(s):

Machine Learning ◽

Language Processing ◽

The United States ◽

Error Rates ◽

Political Factors ◽

Recidivism Rates ◽

Race And Gender ◽

Decision Tools ◽

And Gender ◽

Technical Solutions

This chapter discusses the role of race and gender in artificial intelligence (AI). The rapid permeation of AI into society has not been accompanied by a thorough investigation of the sociopolitical issues that cause certain groups of people to be harmed rather than advantaged by it. For instance, recent studies have shown that commercial automated facial analysis systems have much higher error rates for dark-skinned women, while having minimal errors on light-skinned men. Moreover, a 2016 ProPublica investigation uncovered that machine learning–based tools that assess crime recidivism rates in the United States are biased against African Americans. Other studies show that natural language–processing tools trained on news articles exhibit societal biases. While many technical solutions have been proposed to alleviate bias in machine learning systems, a holistic and multifaceted approach must be taken. This includes standardization bodies determining what types of systems can be used in which scenarios, making sure that automated decision tools are created by people from diverse backgrounds, and understanding the historical and political factors that disadvantage certain groups who are subjected to these tools.

Download Full-text

RNN Based Incremental Online Spoken Language Understanding

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383614 ◽

2021 ◽

Author(s):

Prashanth Gurunath Shivakumar ◽

Naveen Kumar ◽

Panayiotis Georgiou ◽

Shrikanth Narayanan

Keyword(s):

Spoken Language ◽

Language Understanding ◽

Spoken Language Understanding

Download Full-text

Injecting Word Information with Multi-Level Word Adapter for Chinese Spoken Language Understanding

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413657 ◽

2021 ◽

Author(s):

Dechuan Teng ◽

Libo Qin ◽

Wanxiang Che ◽

Sendong Zhao ◽

Ting Liu

Keyword(s):

Spoken Language ◽

Language Understanding ◽

Spoken Language Understanding ◽

Multi Level

Download Full-text

Generation of Cross-Lingual Word Vectors for Low-Resourced Languages Using Deep Learning and Topological Metrics in a Data-Efficient Way

Electronics ◽

10.3390/electronics10121372 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1372

Author(s):

Sanjanasri JP ◽

Vijay Krishna Menon ◽

Soman KP ◽

Rajendran S ◽

Agnieszka Wolk

Keyword(s):

Deep Learning ◽

Language Processing ◽

Semantic Space ◽

Semantic Interpretation ◽

Learning Approaches ◽

Qualitative Comparison ◽

Bilingual Dictionary ◽

Pos Tagging ◽

Part Of Speech ◽

Cross Lingual

Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.

Download Full-text