scholarly journals Improved Estimation of Parkinsonian Vowel Quality through Acoustic Feature Assimilation

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Amr Gaballah ◽  
Vijay Parsa ◽  
Daryn Cushnie-Sparrow ◽  
Scott Adams

This paper investigated the performance of a number of acoustic measures, both individually and in combination, in predicting the perceived quality of sustained vowels produced by people impaired with Parkinson’s disease (PD). Sustained vowel recordings were collected from 51 PD patients before and after the administration of the Levodopa medication. Subjective ratings of the overall vowel quality were garnered using a visual analog scale. These ratings served to benchmark the effectiveness of the acoustic measures. Acoustic predictors of the perceived vowel quality included the harmonics-to-noise ratio (HNR), smoothed cepstral peak prominence (CPP), recurrence period density entropy (RPDE), Gammatone frequency cepstral coefficients (GFCCs), linear prediction (LP) coefficients and their variants, and modulation spectrogram features. Linear regression (LR) and support vector regression (SVR) models were employed to assimilate multiple features. Different feature dimensionality reduction methods were investigated to avoid model overfitting and enhance the prediction capabilities for the test dataset. Results showed that the RPDE measure performed the best among all individual features, while a regression model incorporating a subset of features produced the best overall correlation of 0.80 between the predicted and actual vowel quality ratings. This model may therefore serve as a surrogate for auditory-perceptual assessment of Parkinsonian vowel quality. Furthermore, the model may offer the clinician a tool to predict who may benefit from Levodopa medication in terms of enhanced voice quality.

1983 ◽  
Vol 91 (5) ◽  
pp. 502-508 ◽  
Author(s):  
Gerald S. Berke ◽  
Bruce R. Gerratt ◽  
David G. Hanson

An automated computer system was used to measure changes in vocal quality resulting from laryngeal surgery. Five acoustic parameters were computed and seven listeners made judgments on vowel samples recorded before and after surgery from 37 patients. Presurgery and postsurgery changes in the five acoustic measures were compared with the perception of change in vocal harshness. A multiple regression analysis showed that four of the five acoustic measures accounted for a large amount of the variance in the judgment of change of voice quality following surgery. The measures may provide a quick and relatively sensitive method that may be clinically useful in measuring change in vocal quality.


2020 ◽  
Vol 63 (12) ◽  
pp. 3991-3999
Author(s):  
Benjamin van der Woerd ◽  
Min Wu ◽  
Vijay Parsa ◽  
Philip C. Doyle ◽  
Kevin Fung

Objectives This study aimed to evaluate the fidelity and accuracy of a smartphone microphone and recording environment on acoustic measurements of voice. Method A prospective cohort proof-of-concept study. Two sets of prerecorded samples (a) sustained vowels (/a/) and (b) Rainbow Passage sentence were played for recording via the internal iPhone microphone and the Blue Yeti USB microphone in two recording environments: a sound-treated booth and quiet office setting. Recordings were presented using a calibrated mannequin speaker with a fixed signal intensity (69 dBA), at a fixed distance (15 in.). Each set of recordings (iPhone—audio booth, Blue Yeti—audio booth, iPhone—office, and Blue Yeti—office), was time-windowed to ensure the same signal was evaluated for each condition. Acoustic measures of voice including fundamental frequency ( f o ), jitter, shimmer, harmonic-to-noise ratio (HNR), and cepstral peak prominence (CPP), were generated using a widely used analysis program (Praat Version 6.0.50). The data gathered were compared using a repeated measures analysis of variance. Two separate data sets were used. The set of vowel samples included both pathologic ( n = 10) and normal ( n = 10), male ( n = 5) and female ( n = 15) speakers. The set of sentence stimuli ranged in perceived voice quality from normal to severely disordered with an equal number of male ( n = 12) and female ( n = 12) speakers evaluated. Results The vowel analyses indicated that the jitter, shimmer, HNR, and CPP were significantly different based on microphone choice and shimmer, HNR, and CPP were significantly different based on the recording environment. Analysis of sentences revealed a statistically significant impact of recording environment and microphone type on HNR and CPP. While statistically significant, the differences across the experimental conditions for a subset of the acoustic measures (viz., jitter and CPP) have shown differences that fell within their respective normative ranges. Conclusions Both microphone and recording setting resulted in significant differences across several acoustic measurements. However, a subset of the acoustic measures that were statistically significant across the recording conditions showed small overall differences that are unlikely to have clinical significance in interpretation. For these acoustic measures, the present data suggest that, although a sound-treated setting is ideal for voice sample collection, a smartphone microphone can capture acceptable recordings for acoustic signal analysis.


Author(s):  
Won Gue Han ◽  
Min-Su Kim ◽  
Kyung Ho Oh ◽  
Jeung Soo Woo ◽  
Kwang Yoon Jung ◽  
...  

2017 ◽  
Vol 23 (1) ◽  
pp. 1-20
Author(s):  
Kathy Connaughton ◽  
Irena Yanushevskaya

Objective: This study explores the immediate impact of prolonged voice use by professional sports coaches. Method: Speech samples including sustained phonation of vowel /a/ and a short read passage were collected from two professional sports coaches. The audio recordings were made within an hour before and after a coaching session, over three sessions. Perceptual evaluation of voice quality was done using the GRBAS scale. The speech samples were subsequently analyzed using Praat. The acoustic measures included fundamental frequency (f0), jitter, shimmer, Harmonics-to-Noise ratio and Cepstral Peak Prominence. Main results: The results of perceptual and acoustic analysis suggest a slight shift towards a tenser phonation post-coaching session, which is a likely consequence of laryngeal muscle adaptation to prolonged voice use. This tendency was similar in sustained vowels and connected speech. Conclusion: Acoustic measures used in this study can be useful to capture the voice change post-coaching session. It is desirable, however, that more sophisticated and robust and at the same time intuitive and easy-to-use tools for voice assessment and monitoring be made available to clinicians and professional voice users.


2021 ◽  
Vol 11 (5) ◽  
pp. 1990
Author(s):  
Vinod Devaraj ◽  
Philipp Aichinger

The characterization of voice quality is important for the diagnosis of a voice disorder. Vocal fry is a voice quality which is traditionally characterized by a low frequency and a long closed phase of the glottis. However, we also observed amplitude modulated vocal fry glottal area waveforms (GAWs) without long closed phases (positive group) which we modelled using an analysis-by-synthesis approach. Natural and synthetic GAWs are modelled. The negative group consists of euphonic, i.e., normophonic GAWs. The analysis-by-synthesis approach fits two modelled GAWs for each of the input GAW. One modelled GAW is modulated to replicate the amplitude and frequency modulations of the input GAW and the other modelled GAW is unmodulated. The modelling errors of the two modelled GAWs are determined to classify the GAWs into the positive and the negative groups using a simple support vector machine (SVM) classifier with a linear kernel. The modelling errors of all vocal fry GAWs obtained using the modulating model are smaller than the modelling errors obtained using the unmodulated model. Using the two modelling errors as predictors for classification, no false positives or false negatives are obtained. To further distinguish the subtypes of amplitude modulated vocal fry GAWs, the entropy of the modulator’s power spectral density and the modulator-to-carrier frequency ratio are obtained.


Languages ◽  
2021 ◽  
Vol 6 (3) ◽  
pp. 114
Author(s):  
Ulrich Reubold ◽  
Sanne Ditewig ◽  
Robert Mayr ◽  
Ineke Mennen

The present study sought to examine the effect of dual language activation on L1 speech in late English–Austrian German sequential bilinguals, and to identify relevant predictor variables. To this end, we compared the English speech patterns of adult migrants to Austria in a code-switched and monolingual condition alongside those of monolingual native speakers in England in a monolingual condition. In the code-switched materials, German words containing target segments known to trigger cross-linguistic interaction in the two languages (i.e., [v–w], [ʃt(ʁ)-st(ɹ)] and [l-ɫ]) were inserted into an English frame; monolingual materials comprised English words with the same segments. To examine whether the position of the German item affects L1 speech, the segments occurred either before the switch (“He wants a Wienerschnitzel”) or after (“I like Würstel with mustard”). Critical acoustic measures of these segments revealed no differences between the groups in the monolingual condition, but significant L2-induced shifts in the bilinguals’ L1 speech production in the code-switched condition for some sounds. These were found to occur both before and after a code-switch, and exhibited a fair amount of individual variation. Only the amount of L2 use was found to be a significant predictor variable for shift size in code-switched compared with monolingual utterances, and only for [w]. These results have important implications for the role of dual activation in the speech of late sequential bilinguals.


2001 ◽  
Vol 110 (7) ◽  
pp. 627-634 ◽  
Author(s):  
Ton P. M. Langeveld ◽  
Edgar H. Houtman ◽  
Jeroen J. Briaire ◽  
Maya van Rossum ◽  
Aeilko H. Zwinderman ◽  
...  

2021 ◽  
Author(s):  
Yeremi Pérez ◽  
Roberto Borboa-Gastelum ◽  
Luz Maria Alonso-Valerdi ◽  
David I. Ibarra-Zarate ◽  
Eduardo A. Flores-Villalba ◽  
...  

Abstract Fatigue decreases performance in several professional activities. Fatigue can lead to commit technical mistakes which consequences might be lethal, such as in health area, where a surgical error due to the absence of rest can provoke the patient death. Therefore, this study aims to detect vigil and fatigue (due to lack of sleep) states in medical students through the classification of electroencephalographic (EEG) patterns. The EEG signals of 18 physician students were analyzed within theta band (4 - 8 Hz) over front-central recording sites, and alpha band (8 - 13 Hz) rhythms over temporal and parieto-occipital recording sites during the execution of laparoscopic tasks before and after their medical duties. The EEG signal processing pipeline consisted in pre-processing based on individual component analysis, absolute band power estimates, and Support Vector Machine classification. The F-score to differ between vigil and fatigue states was 90.89%, where the first class was slightly more identifiable reaching a sensitivity of 90.18%. Based on this outcome, the detection of fatigue in medical students while their laparoscopic training seems achievable and feasible to diminish technical mistakes that could be lethal in health area. For this purpose, EEG recording are provided.


2002 ◽  
Vol 45 (4) ◽  
pp. 689-699 ◽  
Author(s):  
Donald G. Jamieson ◽  
Vijay Parsa ◽  
Moneca C. Price ◽  
James Till

We investigated how standard speech coders, currently used in modern communication systems, affect the quality of the speech of persons who have common speech and voice disorders. Three standardized speech coders (GSM 6.10 RPELTP, FS1016 CELP, and FS1015 LPC) and two speech coders based on subband processing were evaluated for their performance. Coder effects were assessed by measuring the quality of speech samples both before and after processing by the speech coders. Speech quality was rated by 10 listeners with normal hearing on 28 different scales representing pitch and loudness changes, speech rate, laryngeal and resonatory dysfunction, and coder-induced distortions. Results showed that (a) nine scale items were consistently and reliably rated by the listeners; (b) all coders degraded speech quality on these nine scales, with the GSM and CELP coders providing the better quality speech; and (c) interactions between coders and individual voices did occur on several voice quality scales.


2004 ◽  
Vol 29 (5) ◽  
pp. 538-544 ◽  
Author(s):  
P.N. Carding ◽  
I.N. Steen ◽  
A. Webb ◽  
K. MacKenzie ◽  
I.J. Deary ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document