scholarly journals Relationship of Cepstral Peak Prominence-Smoothed and Long-Term Average Spectrum with Auditory–Perceptual Analysis

2020 ◽  
Vol 10 (23) ◽  
pp. 8598 ◽  
Author(s):  
Angélica Emygdio da Silva Antonetti ◽  
Larissa Thais Donalonso Siqueira ◽  
Maria Paula de Almeida Gobbo ◽  
Alcione Ghedini Brasolotto ◽  
Kelly Cristina Alves Silverio

Cepstral peak prominence-smoothed (CPPs) and long-term average spectrum (LTAS) are robust measures that represent the glottal source and source-filter interactions, respectively. Until now, little has been known about how physiological events impact auditory–perceptual characteristics in the objective measures of CPPs and LTAS (alpha ratio; L1–L0). Thus, this paper aims to analyze the relationship between such acoustic measures and auditory–perceptual analysis and then determine which acoustic measure best represents voice quality. We analyzed 53 voice samples of vocally healthy participants (vocally healthy group-VHG) and 49 voice samples of participants with behavioral dysphonia (dysphonic group-DG). Each voice sample was composed of sustained vowel /a/ and connected speech. CPPs seem to be the best predictor of voice deviation in both studied populations because there was moderate to strong negative correlations with general degree, breathiness, roughness, and strain (auditory–perceptual parameters). Regarding L1–L0, this measure is related to breathiness (moderate negative correlations). Hence, L1–L0 provides information about air leak through closed glottis, assisting the phonatory efficiency analysis.

2020 ◽  
Vol 63 (12) ◽  
pp. 3991-3999
Author(s):  
Benjamin van der Woerd ◽  
Min Wu ◽  
Vijay Parsa ◽  
Philip C. Doyle ◽  
Kevin Fung

Objectives This study aimed to evaluate the fidelity and accuracy of a smartphone microphone and recording environment on acoustic measurements of voice. Method A prospective cohort proof-of-concept study. Two sets of prerecorded samples (a) sustained vowels (/a/) and (b) Rainbow Passage sentence were played for recording via the internal iPhone microphone and the Blue Yeti USB microphone in two recording environments: a sound-treated booth and quiet office setting. Recordings were presented using a calibrated mannequin speaker with a fixed signal intensity (69 dBA), at a fixed distance (15 in.). Each set of recordings (iPhone—audio booth, Blue Yeti—audio booth, iPhone—office, and Blue Yeti—office), was time-windowed to ensure the same signal was evaluated for each condition. Acoustic measures of voice including fundamental frequency ( f o ), jitter, shimmer, harmonic-to-noise ratio (HNR), and cepstral peak prominence (CPP), were generated using a widely used analysis program (Praat Version 6.0.50). The data gathered were compared using a repeated measures analysis of variance. Two separate data sets were used. The set of vowel samples included both pathologic ( n = 10) and normal ( n = 10), male ( n = 5) and female ( n = 15) speakers. The set of sentence stimuli ranged in perceived voice quality from normal to severely disordered with an equal number of male ( n = 12) and female ( n = 12) speakers evaluated. Results The vowel analyses indicated that the jitter, shimmer, HNR, and CPP were significantly different based on microphone choice and shimmer, HNR, and CPP were significantly different based on the recording environment. Analysis of sentences revealed a statistically significant impact of recording environment and microphone type on HNR and CPP. While statistically significant, the differences across the experimental conditions for a subset of the acoustic measures (viz., jitter and CPP) have shown differences that fell within their respective normative ranges. Conclusions Both microphone and recording setting resulted in significant differences across several acoustic measurements. However, a subset of the acoustic measures that were statistically significant across the recording conditions showed small overall differences that are unlikely to have clinical significance in interpretation. For these acoustic measures, the present data suggest that, although a sound-treated setting is ideal for voice sample collection, a smartphone microphone can capture acceptable recordings for acoustic signal analysis.


1996 ◽  
Vol 10 (1) ◽  
pp. 59-66 ◽  
Author(s):  
Elvira Mendoza ◽  
Nieves Valencia ◽  
Juana Muñoz ◽  
Humberto Trujillo

2000 ◽  
Vol 4 (1) ◽  
pp. 75-93 ◽  
Author(s):  
Allan Vurma ◽  
Jaan Ross

The voices of 42 students studying classical opera singing at the Estonian Academy of Music were investigated to find any objectively definable qualities possibly correlating with the length of training. Each student's singing of a four-bar seven-word initial phrase from a well-known Estonian classical solo was recorded. The recordings were digitalized and subjected to acoustic analysis yielding the long-term average spectrum (LTAS) for each voice studied. It turned out that the longer a singing student had been trained professionally, the higher was the level of the so-called singer's formant in her/his LTAS. Subsequently the voice quality in each recording was evaluated by four experts using a five-point scale, five points marking the best quality and one point the poorest. It turned out that the average ratings did not show any positive correlation with the length of training, rather, a slightly negative trend (notstatistically significant) could be observed. The results seem to support the critical remarksmade bysome Estonian specialists about domestic teaching of vocal music being perhaps inadequate in some respects (Pappel, 1990). The teaching process seems to be focused on the development of those qualities that enable the singer to be audible in large halls and with a symphony orchestra, while the timbral qualities recede into the background.


1997 ◽  
Vol 106 (4) ◽  
pp. 279-285 ◽  
Author(s):  
David G. Hanson ◽  
Judy Chen ◽  
Jack J. Jiang ◽  
Barbara Roa Pauloski

Sixteen patients who had symptoms and signs of chronic posterior laryngitis were evaluated before, during, and after treatment with omeprazole and nocturnal antireflux precautions. Data were analyzed for patients who complained of some hoarseness, who had no smoking history, and who completed all of the voice recording protocol. The patients' voices were recorded before, during, and following treatment with omeprazole and nocturnal antireflux precautions. Voice quality was analyzed by perceptual analysis, and acoustic signal data were measured for jitter, shimmer, and signal-to-noise ratio. Measures of jitter, shimmer, and signal-to-noise ratio changed significantly with treatment of posterior laryngitis (p < .01 for change in each of the measures). Acoustic measures showed some trend of deterioration with cessation of treatment, although the overall improvement in acoustic measures of voice quality was still statistically significant after treatment with omeprazole was discontinued. Although perceived abnormality of voice increased and decreased with the magnitude of measured perturbation of the acoustic signal for some patients, the perceptual assessments were not highly correlated with acoustic measures for individual patients, and the perceptual analysis group data did not show a significant change with time during treatment, in contrast to the significance of change in acoustic measures. The data demonstrate that acoustic measures of jitter, shimmer, and signal-to-noise ratio improve significantly with antisecretory and antireflux treatment of chronic posterior laryngitis, and that for individual patients, these are changes that are detected by trained listeners, but not at statistically high levels of confidence.


2018 ◽  
Vol 61 (4) ◽  
pp. 801-810 ◽  
Author(s):  
Carles Escera ◽  
Fran López-Caballero ◽  
Natàlia Gorina-Careta

Purpose The purpose of this study was to run a proof of concept on a new commercially available device, Forbrain® (Sound For Life Ltd/Soundev, Luxemburg, model UN38.3), to test whether it can modulate the speech of its users. Method Participants were instructed to read aloud a text of their choice during 3 experimental phases: baseline, test, and posttest, while wearing a Forbrain® headset. Critically, for half of the participants (Forbrain group), the device was turned on during the test phase, whereas for the other half (control group), the device was kept off. Voice recordings were analyzed to derive 6 quantitative measures of voice quality over each of the phases of the experiment. Results A significant Group × Phase interaction was obtained for the smoothed cepstral peak prominence, a measure of voice harmony, and for the trendline of the long-term average spectrum, a measure of voice robustness, this latter surviving Bonferroni correction for multiple comparisons. Conclusions The results of this study indicate the effectiveness of Forbrain® in modifying the speech of its users. It is suggested that Forbrain® works as an altered auditory feedback device. It may hence be used as a clinical device in speech therapy clinics, yet further studies are warranted to test its usefulness in clinical groups.


1994 ◽  
Vol 9 ◽  
pp. 129-146
Author(s):  
Marielle Bruyninckx ◽  
Bernard Harmegnies

Abstract. In this paper, we study the productions of twelve French-speaking Belgian subjects who attended a first-year Russian course with the aim of becoming translators or interpreters. A perceptual analysis, conducted by experts of Russian pronunciation, enabled us to define each learner's acquisition profile, and led to the choice of three subjects who had the most contrasted acquisition profiles in the sample. Segmental analysis was used in order to capture some specific phonetic processes involved in the acquisition of Russian. A comparison was made with speech samples drawn from the productions of a native speaker of Russian. Using Long Term Average Spectra as acoustic cues to voice quality, we were able to qualify the subjects' productions in a global and measurable way. This quantititative approach confirmed the results of perceptual and segmental analysis.


Author(s):  
Yi-Fang Chiu ◽  
Amy Neel ◽  
Travis Loux

Purpose Auditory perceptual judgments are commonly used to diagnose dysarthria and assess treatment progress. The purpose of the study was to examine the acoustic underpinnings of perceptual speech abnormalities in individuals with Parkinson's disease (PD). Method Auditory perceptual judgments were obtained from sentences produced by 13 speakers with PD and five healthy older adults. Twenty young listeners rated overall ease of understanding, articulatory precision, voice quality, and prosodic adequacy on a visual analog scale. Acoustic measures associated with the speech subsystems of articulation, phonation, and prosody were obtained, including second formant transitions, articulation rate, cepstral and spectral measures of voice, and pitch variations. Regression analyses were performed to assess the relationships between perceptual judgments and acoustic variables. Results Perceptual impressions of Parkinsonian speech were related to combinations of several acoustic variables. Approximately 36%–49% of the variance in the perceptual ratings were explained by the acoustic measures indicating a modest acoustic perceptual relationship. Conclusions The relationships between perceptual ratings and acoustic signals in Parkinsonian speech are multifactorial and involve a variety of acoustic features simultaneously. The modest acoustic perceptual relationships, however, suggest that future work is needed to further examine the acoustic bases of perceptual judgments in dysarthria.


2021 ◽  
pp. 1-8
Author(s):  
Anne-Maria Laukkanen ◽  
Leena Rantala

<b><i>Background:</i></b> The Acoustic Voice Quality Index (AVQI) is a correlate of dysphonia. It has been found to differentiate between dysphonic and normophonic speakers and to indicate the effects of voice therapy. This study investigates how the AVQI reacts towards creak and strain, which are common in normophonic speakers. <b><i>Methods:</i></b> The material was obtained from an earlier study on 104 Finnish female university students (mean age 24.3 years, SD 6.3 years) with no known pathology of voice or hearing and a perceptually normal voice (G = 0 in GRBAS), who were recorded while reading aloud a standard text and sustaining the vowel [a:]. Perceptual analysis for the amount of creak and strain was carried out by 2 expert listeners. In this study, the AVQI v03.01 was analyzed and correlated with perceptual evaluations. Samples with low and high amounts of creak and strain were compared with <i>t</i> tests. <b><i>Results:</i></b> On average, the AVQI was below the threshold value of dysphonia in the Finnish population. The AVQI (ρ = 0.35, <i>p</i> = 0.000) and its subparameters, smoothed cepstral peak prominence (CPPS; ρ = –0.35, <i>p</i> = 0.000) and harmonics-to-noise ratio (HNR; ρ = –0.30, <i>p</i> = 0.002) showed low but significant correlations with creak. Strain had low but significant correlations with spectral Slope (ρ = 0.38, <i>p</i> = 0.000) and Tilt (ρ = –0.40, <i>p</i> = 0.009). The AVQI was lower (better) in samples that were evaluated as having a high amount of strain, but the difference was not significant. Only CPPS differentiated significantly between low and high amounts of creak. <b><i>Conclusion:</i></b> The AVQI does not seem to differentiate between high and low amounts of creak and strain in normophonic speakers.


2016 ◽  
Author(s):  
Irena Yanushevskaya ◽  
Ailbhe Ní Chasaide ◽  
Christer Gobl
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document