Reproducibility of Voice Analysis with Machine Learning

2021 ◽  
Vol 36 (5) ◽  
pp. 1282-1283
Author(s):  
Jan Rusz ◽  
Jan Švihlík ◽  
Petr Krýže ◽  
Michal Novotný ◽  
Tereza Tykalová
Author(s):  
Maximilian Schmitt ◽  
Björn W. Schuller

Machines are able to obtain rich information from the human voice with a certain reliability. This can comprise information about the affective or mental state, but also traits of the speaker. This chapter introduces all the different technical steps needed in such intelligent voice analysis. Typically, the first step involves extraction of meaningful acoustic features, which are then transformed into a suitable representation. The acoustic information can be augmented by linguistic features originating from a speech-to-text transcription. The features are finally decoded on different levels using machine-learning methods. Recently, ‘deep learning’ has received growing interest, where deep artificial neural networks are used to decode the information. From this, end-to-end learning has evolved, where even the feature extraction step is learned seamlessly, through to the decoding step, mimicking the recognition process in the human brain. Subsequent to the description of according and further frequently encountered methods, the chapter concludes with some future perspective.


2017 ◽  
Vol 2 (3) ◽  
pp. 4-13 ◽  
Author(s):  
Jarrad H. Van Stan ◽  
Daryush D. Mehta ◽  
Robert E. Hillman

This article provides a summary of some recent innovations in voice assessment expected to have an impact in the next 5–10 years on how patients with voice disorders are clinically managed by speech-language pathologists. Specific innovations discussed are in the areas of laryngeal imaging, ambulatory voice monitoring, and “big data” analysis using machine learning to produce new metrics for vocal health. Also discussed is the potential for using voice analysis to detect and monitor other health conditions.


2021 ◽  
Author(s):  
Antonio Suppa ◽  
Francesco Asci ◽  
Giovanni Saggio ◽  
Pietro Di Leo ◽  
Zakarya Zarezadeh ◽  
...  

2020 ◽  
Author(s):  
Alberto Tena ◽  
Francec Claria ◽  
Francesc Solsona ◽  
Einar Meister ◽  
Monica Povedano

BACKGROUND Bulbar involvement is a term used in ALS that refers to motor neuron impairment in the corticobulbar area of the brainstem which produces a dysfunction of speech and swallowing. One of the earliest symptoms of bulbar involvement is voice deterioration characterized by grossly defective articulation, extremely slow laborious speech, marked hypernasality and severe harshness. Bulbar involvement requires well-timed and carefully coordinated interventions. So, early detection is crucial to improving the quality of life and lengthening the life expectancy of those ALS patients who present this dysfunction. OBJECTIVE Recently, research efforts have focused on voice analysis to capture bulbar involvement. The main aim of this paper is to investigate the extraction of voice features and the application of machine learning to estimate whether or not a patient has this deficiency. METHODS We take current research a step further by proposing support vector machines, preceded by principal component analysis of the features obtained from the acoustic analysis of the utterance of the Spanish vowels. RESULTS So far, this has performed best (Accuracy = 95.87\%) when comparing its performance with the models analyzed in the related work. We also show how the model can even improve human diagnosis, which can often misdiagnose bulbar involvement. CONCLUSIONS The results obtained are very encouraging and demonstrate the efficiency and applicability of the automated model presented in this paper.


2017 ◽  
Vol 124 (5) ◽  
pp. 961-973 ◽  
Author(s):  
Yasushi Nakai ◽  
Tetsuya Takiguchi ◽  
Gakuyo Matsui ◽  
Noriko Yamaoka ◽  
Satoshi Takada

Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.


2020 ◽  
Author(s):  
Mariusz Ziolko

BACKGROUND Speech is the main direct means of transmitting information between people. It also carries additional information depending on the state of the speaker: physical features, emotions, psychosocial traits and health conditions. Studying voice features is straightforward, does not require significant investment and can be carried out on a mass scale. However, speech analysis can only be used for initial diagnosis. The outcome of such a pre-diagnosis should indicate the need for standard medical tests. Although speech analysis is currently rarely used in medical diagnostics, it can enable earlier diagnosis and more effective treatment of patients. OBJECTIVE This paper is a systematic review and meta-analysis of recent advancements in using voice analysis for diagnosis and monitoring of some diseases. The goal of this article is to present and compare recent approaches to using speech and voice analysis as biomarkers of diseases. The article takes into account metabolic, respiratory, cardiovascular, endocrine and nervous system disorders. METHODS Articles published between 2010-2019 were selected from PubMed and ISCA Archive, using keywords ‘voice’ and ‘speech’ and respective disorder names. Further selection was performed to identify studies that assessed voice quality quantitatively in selected disorders by acoustic voice analysis (not only perceptual assessment). Information was extracted from each paper in order to compare various aspects of datasets, speech parameters, methods of analysis applied and results obtained. Each chapter starts with a medical description of how each disorder affects voice and contains a summary of different processing approaches, and is supplemented by tables comparing various investigations. Additional diagrams were prepared to illustrate general tendencies and to compare advancements in the state-of-the-art across the analyzed groups of diseases. RESULTS Over 90 research papers were reviewed and over 40 databases were summarized. Basic acoustic parameters which are significantly correlated with each given disorder were developed for cardiovascular, metabolic and endocrine diseases, as well as schizophrenia and amyotrophic lateral sclerosis. The affective and neurodegenerative disorders are well investigated and the majority of papers contain automatic voice recognition and machine learning methods. The main sources of problems were identified and some recommendation for future research were set. CONCLUSIONS Speech analysis is a promising tool for pre-diagnosis of certain disorders. Advanced computerized voice analysis and machine learning algorithms, and the widespread availability of smartphones, means that a diagnosis may be presented during the patient’s appointment with their physician, and even during a telephone conversation.


2020 ◽  
Vol 12 (3) ◽  
pp. 42 ◽  
Author(s):  
Ming Sung Kan ◽  
Atsushi Ito

In language learning, adults seem to be superior in their ability to memorize knowledge of new languages and have better learning strategies, experiences, and intelligence to be able to integrate new knowledge. However, unless one learns pronunciation in childhood, it is almost impossible to reach a native-level accent. In this research, we take the difficulties of learning tonal pronunciation in Mandarin as an example and analyze the difficulties of tone learning and the deficiencies of general learning methods using the cognitive load theory. With the tasks designed commensurate with the learner’s perception ability based on perception experiments and small-step learning, the perception training app is more effective for improving the tone pronunciation ability compared to existing apps with voice analysis function. Furthermore, the learning effect was greatly improved by optimizing the app interface and operation procedures. However, as a result of the combination of pronunciation practice and perception training, pronunciation practice with insufficient feedback could lead to pronunciation errors. Therefore, we also studied pronunciation practice using machine learning and aimed to train the model for the pronunciation task design instead of classification. We used voices designed as training data and trained a model for pronunciation training, and demonstrated that supporting pronunciation practice with machine learning is practicable.


Author(s):  
Ajay Dabas

Researchers enlist Machine learning a viable solution to find vocal patterns. Our believe is voice can relay key information about personnel health and, our research proposes working towards medical diagnosis and disease risk through voice analysis. In next few years, to the advent of technology in medical science, we positively believe health conditions will be diagnosed using smartphones and another wearable technology. This paper aims to propose viable techniques to researchers that will work by recording short speech samples and analyzing for underlying diseases. For psychiatric disorders particularly, there are no blood samples and patients are often embarrassed to talk about facing mental health issues therefore, voice analysis can also be used over here to identify peculiar traits. Different countries are collecting voice analysis samples to test their tone pitch, rhythm, rate, and volume for sign of prediction like PTSD and other brain injuries and depression. Using Machine learning algorithms will help identify vocal patterns of personnel suffering diseases or any other conditions by comparing data with healthy individuals’ voice samples.


Sign in / Sign up

Export Citation Format

Share Document