scholarly journals Monitoring Cognitive Workload Using Vocal Tract and Voice Source Features

Author(s):  
Eydis Huld Magnusdottir ◽  
Michal Borsky ◽  
Manuela Meier ◽  
Kamilla Johannsdottir ◽  
Jon Gudnason

Monitoring cognitive workload from speech signals has received much attention from researchers in the past few years as it has the potential to improve performance and fidelity in human decision making. The bulk of the research has focused on classifying speech from talkers participating in cognitive workload experiments using simple reading tasks, memory span tests and the Stroop test, typically into three levels of low, medium and high cognitive workload. This study focuses on using parameters extracted from the vocal tract and the voice source components of the speech signal for cognitive workload monitoring. The experiment used in this study contains 98 participants, the levels were obtained by using a reading task and three Stroop tasks which were randomly ordered for each participant and an adequate rest time was used inbetween tasks to mitigate the effect of cognitive workload from one task affecting the subsequent one. Vocal tract features were obtained from the first three formants and voice source features were extracted using signal analysis on the inverse filtered speech signal. The results show that on their own, the vocal tract features outperform the voice source features. The MCR of 33.92% ± 1.05 was achieved with a SVM classifier. A weighted combination of vocal tract and voice source features classified with SWM classifier fused at the output level achieved the lowest MCR of  32.5%.

Author(s):  
Johan Sundberg

The sound quality of singing is determined by three basic factors—the air pressure under the vocal folds (or the subglottal pressure), the mechanical properties of the vocal folds, and the resonance properties of the vocal tract. Subglottal pressure is controlled by the respiratory apparatus. It regulates vocal loudness and is varied with pitch in singing. Together with the mechanical properties of the folds, which are controlled by laryngeal muscles, it has a decisive influence on vocal fold vibrationswhich convert the tracheal airstream to a pulsating airflow, the voice source. The voice source determines pitch, vibrato, and register, and also the overall slope of the spectrum. The sound of the voice source is filtered by the resonances of the vocal tract, or the formants, of which the two lowest determine the vowel quality and the higher ones the personal voice quality. Timing is crucial for creating emotional expressivity; it uses an acoustic code that shows striking similarities to that used in speech. The perceived loudness of a vowel sound seems more closely related to the subglottal pressure with which it was produced than with the acoustical sound level. Some investigations of acoustical correlates of tone placement and variation of larynx height are described, as are properties that affect the perceived naturalness of synthesized singing. Finally, subglottal pressure, voice source, and formant-frequency characteristics of some non-classical styles of singing are discussed.


Author(s):  
Manuela Meier ◽  
Michal Borsky ◽  
Eydis H. Magnusdottir ◽  
Kamilla R. Johannsdottir ◽  
Jon Gudnason

1997 ◽  
Vol 101 (4) ◽  
pp. 2234-2243 ◽  
Author(s):  
Ingo R. Titze ◽  
Brad H. Story
Keyword(s):  

2013 ◽  
Vol 25 (12) ◽  
pp. 3294-3317 ◽  
Author(s):  
Lijiang Chen ◽  
Xia Mao ◽  
Pengfei Wei ◽  
Angelo Compare

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition.


2021 ◽  
Author(s):  
Olga A. Loskutova ◽  
Anastasia. V. Nenko ◽  
Yana. A. Berg ◽  
Daria V. Borovikova ◽  
Anton V. Yupashevsky

Author(s):  
Filipa M. B. Lã ◽  
Brian P. Gill

Singing performance is highly competitive; thus, finding strategies to accelerate the acquisition of knowledge that results in an efficient and effective vocal technique is of the utmost importance. There are many ways in which a singer may acquire an efficient and effective vocal technique, which can be based on the physiological processes of voice production. This chapter explores these processes within the context of singing performance. The authors examine three major aspects of singing: 1) efficient control of breathing, such that optimal airflow and subglottal pressure are available as needed, for a given frequency and intensity; 2) maximized laryngeal coordination, so that the voice source signal contains all the necessary frequency components for the desired tone; and 3) the modulation of the source signal by subtle shaping of the vocal tract. The advantages and disadvantages of various pedagogical methods are discussed, including breath management, known as appoggio, and different resonant strategies. The authors advocate for a scientifically-grounded teaching method, which allows for physiological differences between individuals, genders, and voice classifications.


2003 ◽  
Vol 60 (2) ◽  
pp. 155-159 ◽  
Author(s):  
Jovisa Obrenovic ◽  
Milkica Nesic ◽  
Vladimir Nesic ◽  
Snezana Cekic

The influence of intensive acute hypoxia on the frequency-amplitude formant vocal O characteristics was investigated in this study. Examinees were exposed to the simulated altitudes of 5 500 m and 6 700 m in climabaro chamber and resolved Lotig?s test in the conditions of normoxia, i.e. pronounced the three-digit numbers beginning from 900, but in reversed order. Frequency and intensity values of vocal O (F1, F2, F3 and F4) extracted from the context of the pronunciation of the word eight (osam in Serbian), were measured by spectral speech signal analysis. Changes in frequency values and the intensity of the formants were examined. The obtained results showed that there were no significant changes of the formant frequencies in hypoxia condition compared to normoxia. Though significant changes of formant?s intensities were found compared to normoxia on the cited altitudes. The rise of formants intensities was found at the altitude of 5 500 m. Hypoxia at the altitude of 6 700 m caused the significant fall of the intensities in the initial period, compared to normoxia. The prolonged hypoxia exposure caused the rise of the formant intensities compared to the altitude of 5 500 m. In may be concluded that due to different altitudes, hypoxia causes different effects on the formants structure changes, compared to normoxia.


Author(s):  
Shibanee Dash . ◽  
Mihir Narayan Mohanty .

Modern wireless communication has gained a improved position as compared to previous time. Similarly, speech communication is the major focus area of research in respective applications. Many developments are done in this field. In this work, we have chosen the OFDM modulation based communication system, as it has importance in both licensed and unlicensed wireless communication platform. The voice signal is passed though the proposed model to obtain at the receiver end. Due to different circumstances, the signal may be corrupted partially at the user end. Authors try to achieve a better signal for reception using a neural network model of RBFN. The parameters are chosen for the RBFN model, as energy, ZCR, ACF, and fundamental frequency of the speech signal. In one part these parameters have eligibility to eliminate noise partially, where as in other part the RBFN model with these parameters proves its efficacy for both noisy speech signals with noisy channel as Gaussian channel. The efficiency of OFDM model is verified in terms of symbol error rate and the transmitted speech signal is evaluated in term of SNR that shows the reduction of noise. For visual inspection, a sample of signal, noisy signal and received signal is also shown. The experiment is performed with 5dB, 10dB, 15dB noise levels. The result proves the performance of RBFN model as the filter.The performance is measured as the listener’s voice in each condition. The results show that, at the time of the voice in noise environment, proposed technique improves the intelligibility on speech quality.


Sign in / Sign up

Export Citation Format

Share Document