Voice source and vocal tract variations as cues to emotional states perceived from expressive conversational speech

Singing performance is highly competitive; thus, finding strategies to accelerate the acquisition of knowledge that results in an efficient and effective vocal technique is of the utmost importance. There are many ways in which a singer may acquire an efficient and effective vocal technique, which can be based on the physiological processes of voice production. This chapter explores these processes within the context of singing performance. The authors examine three major aspects of singing: 1) efficient control of breathing, such that optimal airflow and subglottal pressure are available as needed, for a given frequency and intensity; 2) maximized laryngeal coordination, so that the voice source signal contains all the necessary frequency components for the desired tone; and 3) the modulation of the source signal by subtle shaping of the vocal tract. The advantages and disadvantages of various pedagogical methods are discussed, including breath management, known as appoggio, and different resonant strategies. The authors advocate for a scientifically-grounded teaching method, which allows for physiological differences between individuals, genders, and voice classifications.

Download Full-text

Acoustic Properties of the Voice Source and the Vocal Tract: Are They Perceptually Independent?

Journal of Voice ◽

10.1016/j.jvoice.2015.11.010 ◽

2016 ◽

Vol 30 (6) ◽

pp. 772.e9-772.e22 ◽

Cited By ~ 2

Author(s):

Molly L. Erickson

Keyword(s):

Vocal Tract ◽

Acoustic Properties ◽

Voice Source ◽

The Voice

Download Full-text

A novel approach to the estimation of voice source and vocal tract parameters from speech signals

Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96 ◽

10.1109/icslp.1996.607837 ◽

2002 ◽

Cited By ~ 3

Author(s):

W. Ding ◽

H. Kasuya

Keyword(s):

Vocal Tract ◽

Speech Signals ◽

Voice Source ◽

Novel Approach

Download Full-text

Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00216-5 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Masoud Geravanchizadeh ◽

Elnaz Forouhandeh ◽

Meysam Bashirpour

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Vocal Tract ◽

Gaussian Mixture ◽

Recognition System ◽

Speech Recognition System ◽

Emotional States ◽

Emotional Speech ◽

Automatic Speech Recognition System ◽

Frequency Warping

AbstractThe performance of speech recognition systems trained with neutral utterances degrades significantly when these systems are tested with emotional speech. Since everybody can speak emotionally in the real-world environment, it is necessary to take account of the emotional states of speech in the performance of the automatic speech recognition system. Limited works have been performed in the field of emotion-affected speech recognition and so far, most of the researches have focused on the classification of speech emotions. In this paper, the vocal tract length normalization method is employed to enhance the robustness of the emotion-affected speech recognition system. For this purpose, two structures of the speech recognition system based on hybrids of hidden Markov model with Gaussian mixture model and deep neural network are used. To achieve this goal, frequency warping is applied to the filterbank and/or discrete-cosine transform domain(s) in the feature extraction process of the automatic speech recognition system. The warping process is conducted in a way to normalize the emotional feature components and make them close to their corresponding neutral feature components. The performance of the proposed system is evaluated in neutrally trained/emotionally tested conditions for different speech features and emotional states (i.e., Anger, Disgust, Fear, Happy, and Sad). In this system, frequency warping is employed for different acoustical features. The constructed emotion-affected speech recognition system is based on the Kaldi automatic speech recognition with the Persian emotional speech database and the crowd-sourced emotional multi-modal actors dataset as the input corpora. The experimental simulations reveal that, in general, the warped emotional features result in better performance of the emotion-affected speech recognition system as compared with their unwarped counterparts. Also, it can be seen that the performance of the speech recognition using the deep neural network-hidden Markov model outperforms the system employing the hybrid with the Gaussian mixture model.

Download Full-text

The Sound Source in Singing

The Oxford Handbook of Singing ◽

10.1093/oxfordhb/9780199660773.013.011 ◽

2015 ◽

pp. 108-144 ◽

Cited By ~ 1

Author(s):

Christian T. Herbst ◽

David M. Howard ◽

Jan G. Švec

Keyword(s):

Vocal Tract ◽

Spectral Composition ◽

Physiological Parameters ◽

Special Focus ◽

Singing Voice ◽

Voice Pedagogy ◽

Typical Application ◽

Voice Source ◽

Vocal Fold Vibration

The voice instrument is composed of three basic sub-systems: the pulmonary apparatus, the laryngeal voice source, and the vocal tract for sound modification. In this chapter, the laryngeal sound generation is examined in closer detail, with a special focus on singing voice production. In particular, the relation between the quality of vocal fold vibration, the consistence of the glottal airflow, and the spectral composition of the resulting laryngeal sound output (before being filtered by the vocal tract) is discussed. Two basic physiological parameters for controlling these features are described: cartilaginous adduction (controlled along the dimension of “breathy” vs. “pressed” voice); and membranous medialization (influenced by the choice of singing voice register). It is shown that these two physiological parameters can be varied independently, and how they can be incorporated into a pedagogical model. Based on this model, a typical application from the singing studio is described. Finally, the range of sound qualities resulting from independent variation of cartilaginous adduction and membranous medialization is being commented on by five known voice pedagogues, in an attempt to unify the respective terminology in voice pedagogy.

Download Full-text

A Study of Voice Source and Vocal Tract Filter Based Features in Cognitive Load Classification

2010 20th International Conference on Pattern Recognition ◽

10.1109/icpr.2010.1097 ◽

2010 ◽

Cited By ~ 6

Author(s):

Phu Ngoc Le ◽

Julien Epps ◽

Eric H.C. Choi ◽

Eliathamby Ambikairajah

Keyword(s):

Cognitive Load ◽

Vocal Tract ◽

Voice Source

Download Full-text

Measuring Variations of Voice Source and Vocal Tract Characteristics from Korean Emotional Voice

Sixth International Conference on Intelligent Systems Design and Applications ◽

10.1109/isda.2006.253715 ◽

2006 ◽

Author(s):

Cheolwoo Jo ◽

Jianglin Wang

Keyword(s):

Vocal Tract ◽

Voice Source

Download Full-text

Contributions of voice‐source and vocal‐tract characteristics to speaker identity

The Journal of the Acoustical Society of America ◽

10.1121/1.405119 ◽

1992 ◽

Vol 92 (4) ◽

pp. 2301-2301

Author(s):

J. H. Eggen ◽

S. G. Nooteboom ◽

A. J. M. Houtsma

Keyword(s):

Vocal Tract ◽

Voice Source

Download Full-text

The Singing Voice

The Oxford Handbook of Voice Perception ◽

10.1093/oxfordhb/9780198743187.013.6 ◽

2018 ◽

pp. 116-142

Author(s):

Johan Sundberg

Keyword(s):

Mechanical Properties ◽

Vocal Tract ◽

Voice Quality ◽

Vocal Folds ◽

Sound Level ◽

Vowel Sound ◽

Formant Frequency ◽

Voice Source ◽

Subglottal Pressure ◽

The Voice

The sound quality of singing is determined by three basic factors—the air pressure under the vocal folds (or the subglottal pressure), the mechanical properties of the vocal folds, and the resonance properties of the vocal tract. Subglottal pressure is controlled by the respiratory apparatus. It regulates vocal loudness and is varied with pitch in singing. Together with the mechanical properties of the folds, which are controlled by laryngeal muscles, it has a decisive influence on vocal fold vibrationswhich convert the tracheal airstream to a pulsating airflow, the voice source. The voice source determines pitch, vibrato, and register, and also the overall slope of the spectrum. The sound of the voice source is filtered by the resonances of the vocal tract, or the formants, of which the two lowest determine the vowel quality and the higher ones the personal voice quality. Timing is crucial for creating emotional expressivity; it uses an acoustic code that shows striking similarities to that used in speech. The perceived loudness of a vowel sound seems more closely related to the subglottal pressure with which it was produced than with the acoustical sound level. Some investigations of acoustical correlates of tone placement and variation of larynx height are described, as are properties that affect the perceived naturalness of synthesized singing. Finally, subglottal pressure, voice source, and formant-frequency characteristics of some non-classical styles of singing are discussed.

Download Full-text

Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the Glottal source waveform

10.1109/icassp.1987.1169874 ◽

2005 ◽

Cited By ~ 17

Author(s):

H. Fujisaki ◽

M. Ljungqvist

Keyword(s):

Vocal Tract ◽

Voice Source ◽

Glottal Source

Download Full-text