scholarly journals Using voice-quality measurements with prosodic and spectral features for speaker diarization

Author(s):  
Abraham Woubie ◽  
Jordi Luque ◽  
Javier Hernando
2021 ◽  
Vol 150 (4) ◽  
pp. A356-A356
Author(s):  
Jailyn M. Pena ◽  
Alicia Mason ◽  
Lisa Davidson

2005 ◽  
Vol 19 (2) ◽  
pp. 176-186 ◽  
Author(s):  
Dimitar D. Deliyski ◽  
Maegan K. Evans ◽  
Heather S. Shaw

Age-related changes to the vocal structure affect the singing ability of the singer. We present a longitudinal study of vocal ageing of a female professional playback singer having more than six decades of singing span (covering singer age from 19 to 80 years). The ageing analysis is performed on six vocal parameters like – fundamental frequency (F0), vibrato, formants and spectral features like spectral roll-off and centroid. Statistical variations in these vocal parameters over the entire singing span of the singer are discussed in the paper. Significant effects noted with the ageing voice were - decrease in F0, decreased vocal range, reduction in vibrato rate, increase in vibrato extent, decrease in F2 & F4 formants and rapid change in the spectral features. This investigation also studied the effect of ageing on singing voice quality through the measurement of singing power ratio (SPR). Increase in SPR measures was observed with ageing voice. The study of impact of vocal ageing with longitudinal data on singer identification (SID) is scare. The SID experimentation performed with 350 cappella songs covering entire singing span of the singer, showed a clear impact that change in acoustical parameters with ageing affected the performance of singer identification systems.


Author(s):  
Dorel Picovici ◽  
John Nelson

Perceptual voice quality measurement can be defined as an objective quantification of an overall impression of the perceived stimulus. An alternative to laborious subjective testing is objective predictive modelling, which employs a perceptual model of the human auditory and cognitive system to predict the human response to a voice signal in terms of its quality. This chapter describes subjective and automated objective testing methods, and provides a test case scenario for measuring voice quality.


2005 ◽  
Vol 19 (1) ◽  
pp. 15-28 ◽  
Author(s):  
Dimitar D. Deliyski ◽  
Heather S. Shaw ◽  
Maegan K. Evans

2020 ◽  
Vol 63 (4) ◽  
pp. 1071-1082
Author(s):  
Theresa Schölderle ◽  
Elisabet Haas ◽  
Wolfram Ziegler

Purpose The aim of this study was to collect auditory-perceptual data on established symptom categories of dysarthria from typically developing children between 3 and 9 years of age, for the purpose of creating age norms for dysarthria assessment. Method One hundred forty-four typically developing children (3;0–9;11 [years;months], 72 girls and 72 boys) participated. We used a computer-based game specifically designed for this study to elicit sentence repetitions and spontaneous speech samples. Speech recordings were analyzed using the auditory-perceptual criteria of the Bogenhausen Dysarthria Scales, a standardized German assessment tool for dysarthria in adults. The Bogenhausen Dysarthria Scales (scales and features) cover clinically relevant dimensions of speech and allow for an evaluation of well-established symptom categories of dysarthria. Results The typically developing children exhibited a number of speech characteristics overlapping with established symptom categories of dysarthria (e.g., breathy voice, frequent inspirations, reduced articulatory precision, decreased articulation rate). Substantial progress was observed between 3 and 9 years of age, but with different developmental trajectories across different dimensions. In several areas (e.g., respiration, voice quality), 9-year-olds still presented with salient developmental speech characteristics, while in other dimensions (e.g., prosodic modulation), features typically associated with dysarthria occurred only exceptionally, even in the 3-year-olds. Conclusions The acquisition of speech motor functions is a prolonged process not yet completed with 9 years. Various developmental influences (e.g., anatomic–physiological changes) shape children's speech specifically. Our findings are a first step toward establishing auditory-perceptual norms for dysarthria in children of kindergarten and elementary school age. Supplemental Material https://doi.org/10.23641/asha.12133380


2020 ◽  
Vol 63 (12) ◽  
pp. 3991-3999
Author(s):  
Benjamin van der Woerd ◽  
Min Wu ◽  
Vijay Parsa ◽  
Philip C. Doyle ◽  
Kevin Fung

Objectives This study aimed to evaluate the fidelity and accuracy of a smartphone microphone and recording environment on acoustic measurements of voice. Method A prospective cohort proof-of-concept study. Two sets of prerecorded samples (a) sustained vowels (/a/) and (b) Rainbow Passage sentence were played for recording via the internal iPhone microphone and the Blue Yeti USB microphone in two recording environments: a sound-treated booth and quiet office setting. Recordings were presented using a calibrated mannequin speaker with a fixed signal intensity (69 dBA), at a fixed distance (15 in.). Each set of recordings (iPhone—audio booth, Blue Yeti—audio booth, iPhone—office, and Blue Yeti—office), was time-windowed to ensure the same signal was evaluated for each condition. Acoustic measures of voice including fundamental frequency ( f o ), jitter, shimmer, harmonic-to-noise ratio (HNR), and cepstral peak prominence (CPP), were generated using a widely used analysis program (Praat Version 6.0.50). The data gathered were compared using a repeated measures analysis of variance. Two separate data sets were used. The set of vowel samples included both pathologic ( n = 10) and normal ( n = 10), male ( n = 5) and female ( n = 15) speakers. The set of sentence stimuli ranged in perceived voice quality from normal to severely disordered with an equal number of male ( n = 12) and female ( n = 12) speakers evaluated. Results The vowel analyses indicated that the jitter, shimmer, HNR, and CPP were significantly different based on microphone choice and shimmer, HNR, and CPP were significantly different based on the recording environment. Analysis of sentences revealed a statistically significant impact of recording environment and microphone type on HNR and CPP. While statistically significant, the differences across the experimental conditions for a subset of the acoustic measures (viz., jitter and CPP) have shown differences that fell within their respective normative ranges. Conclusions Both microphone and recording setting resulted in significant differences across several acoustic measurements. However, a subset of the acoustic measures that were statistically significant across the recording conditions showed small overall differences that are unlikely to have clinical significance in interpretation. For these acoustic measures, the present data suggest that, although a sound-treated setting is ideal for voice sample collection, a smartphone microphone can capture acceptable recordings for acoustic signal analysis.


Sign in / Sign up

Export Citation Format

Share Document