Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis

Computational and Mathematical Methods in Medicine ◽

10.1155/2015/316325 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 4

Author(s):

Tino Haderlein ◽

Cornelia Schwemmle ◽

Michael Döllinger ◽

Václav Matoušek ◽

Martin Ptok ◽

...

Keyword(s):

Vocal Fold ◽

Voice Quality ◽

Support Vector ◽

Analysis Method ◽

Prosodic Features ◽

Connected Speech ◽

Perceptual Evaluation ◽

Vocal Fold Vibration ◽

The North ◽

Automatic Methods

Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men;48.7±17.8years) containing the German version of the text “The North Wind and the Sun” were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners’ ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r=0.71,ρ=0.57). These correlations were approximately the same as the interrater agreement among human raters (r=0.65,ρ=0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis.

Download Full-text

Vocal fold injury following endotracheal intubation

The Journal of Laryngology & Otology ◽

10.1258/002221505774481192 ◽

2005 ◽

Vol 119 (10) ◽

pp. 825-827 ◽

Cited By ~ 15

Author(s):

Satoshi Kitahara ◽

Yukihiro Masuda ◽

Yoko Kitagawa

Keyword(s):

Endotracheal Intubation ◽

Fibrin Glue ◽

Vocal Fold ◽

Case Reports ◽

Voice Quality ◽

Fibrous Tissue ◽

Normal Result ◽

Perceptual Evaluation ◽

Vibratory Pattern

Vocal fold scarring results in the formation of fibrous tissue which disturbs the vibratory pattern of the fold during phonation. However, vocal fold scarring in humans is poorly understood because of the lack of clear case reports focusing on voice quality. The authors present a case of vocal fold scarring with changes in voice quality. At the time of injury the pedicle mucosa was cemented with fibrin glue. Phonation was inhibited for two weeks and tranilast (300 mg/day) was given for 3 months. Sixty-nine days later, perceptual evaluation showed a normal result and the phonation time became better, but the mucosal vibration was still lacking. Ninety-seven days later, mucosal vibration was finally restored. We suggest that characterization of vocal fold scarring in humans may be different from that in animals, and recommend that surgical management should be avoided for at least three months after injury.

Download Full-text

Vocal Fold Vibration and Voice Quality

Folia Phoniatrica et Logopaedica ◽

10.1159/000021510 ◽

1999 ◽

Vol 52 (1-3) ◽

pp. 32-38 ◽

Cited By ~ 28

Author(s):

Seiji Niimi ◽

Mamiko Miyaji

Keyword(s):

Vocal Fold ◽

Voice Quality ◽

Vocal Fold Vibration

Download Full-text

Vocal Fold Vibration Characteristics during SOVTE using a Vibration Simulator and Digital Kymography

Communication Sciences & Disorders ◽

10.12963/csd.21849 ◽

2021 ◽

Vol 26 (4) ◽

pp. 921-932

Author(s):

Ji Sung Kim ◽

Seong Hee Choi ◽

Kyoungjae Lee ◽

Chul-Hee Choi ◽

Soo-Geun Wang ◽

...

Keyword(s):

Clinical Practice ◽

Vocal Fold ◽

Clinical Evidence ◽

Vocal Tract ◽

Voice Quality ◽

Vibration Characteristics ◽

Single Source ◽

Vocal Fold Vibration ◽

Vibration Parameters

Objectives: The purpose of this study is to investigate the characteristics of vocal fold vibration during sustained vowel /a/ phonation and various semi-occluded vocal tract exercise (SOVTEs) using a vibration simulator and digital kymography (DKG).Methods: A total of 12 normal young speakers (6 males, 6 females) aged 20-30 years participated in the study. They phonated a sustained /a/ vowel and performed SOVTE. The vocal fold vibration characteristics were measured according to the number of vibration sources (single vs. double), and vocal tract occlusion degree using a vibration simulator and DKG. Glottal gap quotient (GQ, %), speed quotient (SQ, %) and amplitude (pixel) were estimated quantitatively from the DKG image.Results: The results showed that significantly higher GQ (p = .000) and SQ (p = .000) were observed in the humming and bilabial fricative /β/ compared to open vowels. The amplitude was significantly higher in the open vowel /a/ than in humming (p = .018) and bilabial fricative /β/ (p = .003). Also, when comparing the vocal fold vibration parameters according to vibration type (single source: straw phonation vs. double source: straw phonation with water), the double source presented a significantly higher GQ (p = .000) as well as SQ (p = .008) in comparison with a single source.Conclusion: SOVTE showed a glottal gap that is different from the opened vowel /a/. It also had a longer opening of the vocal fold and a smaller amplitude than the vowel. This suggests that SOVTE may be helpful for facilitating vocal fold vibration and good voice quality in clinical practice. The current study can be meaningful in providing theoretical and clinical evidence for SOVTE.

Download Full-text

Results of Fascial Augmentation in Unilateral Vocal Fold Paralysis

Annals of Otology Rhinology & Laryngology ◽

10.1177/000348940211100608 ◽

2002 ◽

Vol 111 (6) ◽

pp. 523-529 ◽

Cited By ~ 25

Author(s):

Petri Reijonen ◽

Sari Lehikoinen-Söderlund ◽

Heikki Rihkanen

Keyword(s):

Vocal Fold ◽

Acoustic Analysis ◽

Voice Quality ◽

Vocal Fold Paralysis ◽

Unilateral Vocal Fold Paralysis ◽

Inexpensive Method ◽

Maximum Phonation Time ◽

Perceptual Evaluation

The objective of this study was to evaluate the effects on voice quality of augmentation by injection of minced fascia in patients with unilateral vocal fold paralysis. Preoperative and postoperative voice samples from 14 patients (6 men and 8 women; mean age, 59 years) were analyzed by computerized acoustic analysis and blinded perceptual evaluation. Statistically significant improvements were seen in perturbation measurements (jitter and shimmer), noise-to-harmonics ratio, and maximum phonation time. A panel of evaluators rated 10 of the 14 postoperative voices as normal or near-normal. Injection laryngoplasty with minced fascia offers a new, effective, well-tolerated, and inexpensive method to medialize a paralyzed vocal fold. The graft seems to survive well, as indicated by good vocal results with a follow-up ranging from 5 to 32 months.

Download Full-text

Medialization thyroplasty in glottis insufficiency due to unilateral vocal fold paralysis and after laser cordectomies – preliminary report

Otolaryngologia Polska ◽

10.5604/01.3001.0009.5548 ◽

2017 ◽

Vol 71 (1) ◽

pp. 22-29

Author(s):

Anna Rzepakowska ◽

Ewa Osuch-Wójcikiewicz ◽

Ewelina Sielska-Badurek ◽

Kazimierz Niemczyk

Keyword(s):

Vocal Fold ◽

Acoustic Analysis ◽

Standard Procedure ◽

Thyroid Cartilage ◽

Voice Quality ◽

Type I ◽

Vocal Fold Paresis ◽

Maximum Phonation Time ◽

Perceptual Evaluation ◽

Medialization Thyroplasty

Medialization thyroplasty (type I) is surgical procedure performed on the thyroid cartilage. The major indication for this surgery is significant glottis insufficiency due to unilateral vocal fold paresis. However the proce¬dure is also performed after vocal fold resections during cordectomy. The aim: The evaluation of voice results in patients after medialisation throplasty. Material and methods: In Otolaryngology Department of Medical University of Warsaw there were performed so far 8 thyroplasty procedures under local anaesthesia with implantation of medical silicon protesis. 6 patients had unilat¬eral vocal fold paresis and the rest two underwent in the past laser cordectomy due to T1a vocal carcinoma. Results: There were no complications during and post the surgery. The follow up examination in 1st , 3rd, 6th i 12th months postoperatively revealed for all patients significant improvement of glottal closure in laryngeal videostrobos¬copy. The voice quality improved both in perceptual evaluation (GRBAS scale) and acoustic analysis (F0, jitter, shim¬mer, NHR) in both patients groups. However the rate of improvement was much more significant in group with uni¬lateral vocal fold paresis. In all patients the maximum phonation time (MPT) increased. The self-evaluation of voice quality with Voice Handicap Index questionnaire confirmed also individual improvement. Conclusions: The speech rehabilitations is not successful in each patient with glottis insufficiency. The medialisation thyroplasty remains the standard procedure for permanent improvement of voice quality in those cases.

Download Full-text

Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech

Journal of Speech Language and Hearing Research ◽

10.1044/2016_jslhr-s-15-0307 ◽

2016 ◽

Vol 59 (5) ◽

pp. 994-1001 ◽

Cited By ~ 20

Author(s):

Bruce R. Gerratt ◽

Jody Kreiman ◽

Marc Garellek

Keyword(s):

Vocal Fold ◽

Voice Quality ◽

Quality Analysis ◽

Perceptual Information ◽

Continuous Speech ◽

Voice Source ◽

Acoustic Measures ◽

Vocal Fold Vibration ◽

Vocal Quality ◽

Quality Deviation

Purpose The question of what type of utterance—a sustained vowel or continuous speech—is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. Method Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. Results Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. Conclusions Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.

Download Full-text

Structural Fat Grafting to Improve Outcomes of Vocal Folds’ Fat Augmentation: Long-term Results

Otolaryngology ◽

10.1177/0194599817739256 ◽

2017 ◽

Vol 158 (1) ◽

pp. 135-143 ◽

Cited By ~ 14

Author(s):

Giovanna Cantarella ◽

Riccardo F. Mazzola ◽

Michele Gaffuri ◽

Elisabetta Iofrida ◽

Pietro Biondetti ◽

...

Keyword(s):

Vocal Fold ◽

Voice Quality ◽

Case Series ◽

Vocal Folds ◽

Fat Grafting ◽

University Hospital ◽

Long Term Results ◽

Perceptual Evaluation ◽

Mri Scans

Objective Evaluating the long-term outcomes of vocal fold structural fat grafting. Study Design Case series with chart review. Setting University hospital. Subjects and Methods Seventy-nine dysphonic patients (16-82 years; 55 with unilateral laryngeal paralysis and 24 with vocal fold scarring) underwent vocal fold fat injection. Fat was harvested by low-pressure liposuction and then processed by centrifugation. Refined fat aliquots were placed in the vocal fold and paraglottic space in multiple tunnels to enhance graft neovascularization. All patients were followed for 12 months, 15 for 3 years, and 5 for 10 years with videolaryngostroboscopy, maximal phonation time (MPT) measurement, Voice Handicap Index (VHI) questionnaire, and GRBAS (grade, roughness, breathiness, asthenia, strain) perceptual evaluation. Laryngeal computed tomography (CT) and/or magnetic resonance imaging (MRI) studies were performed in 16 patients 3 to 28 months postoperatively; MRI was repeated in 5 cases 12 to 18 months after the first radiological study. Results The voice quality of all patients improved after surgery, and long-term stability was confirmed by MPT, GRBAS, and VHI ( P ranging between .004 and <.001). The results achieved 1 year postoperatively remained stable at 3 and 10 years. Videolaryn-gostroboscopy showed improved glottic closure in all patients despite a limited amount of fat resorption. CT and MRI demonstrated survival of the fat grafts in all of the 16 examined cases. Serial MRI scans showed no change in graft size over time. Conclusions The reported clinical and radiological data demonstrate that fat is an effective filler for permanent vocal fold augmentation if the refined micro-aliquots are placed in multiple tunnels.

Download Full-text

Consistency and Reliability of Voice Quality Ratings for Different Types of Speech Fragments

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3705.985 ◽

1994 ◽

Vol 37 (5) ◽

pp. 985-1000 ◽

Cited By ~ 76

Author(s):

Guus de Krom

Keyword(s):

Stimulus Type ◽

Voice Quality ◽

The Other ◽

Connected Speech ◽

Speech Stimuli ◽

Perceptual Evaluation ◽

Quality Ratings ◽

Different Types ◽

Stable Part ◽

Post Onset

This study describes a perception experiment in which listeners were asked to rate voice fragments obtained from a variety of speakers on grade, breathiness, and roughness. Four different types of stimuli were presented to each listener. One type of stimulus was based on connected speech fragments; the other three were based on different segments of a sustained vowel, yielding a 200 msec vowel onset stimulus, a 200 msec post-onset stimulus, and a 1000 msec whole vowel stimulus. Analyses focused on the consistency and reliability of grade, roughness, and breathiness ratings. Results indicated that stimulus type had virtually no effect on either within- or between-listener consistency of the grade, breathiness, or roughness ratings. Rating reliability too was hardly influenced by stimulus type. When determined as a function of the overall degree of deviance of a voice, the reliability of breathiness and roughness ratings was slightly higher for whole vowel and vowel onset stimuli than for connected speech and post-onset stimuli. It is concluded that connected speech stimuli are not necessarily to be preferred over vowel-type stimuli for a perceptual evaluation of grade, roughness, or breathiness. The somewhat higher reliability of ratings on vowel onset and whole vowel stimuli as compared to the post-onset stimuli is taken as an indication that the onset part of a vowel may contain voice quality cues that are less salient in the most stable part of a vowel.

Download Full-text