Estimation of the vocal tract spectrum from articulatory movements using phoneme-dependent neural networks

To produce vocalizations including speech and song, the control of all muscles along the vocal tract (e.g. for respiration, vocal fold motion, resonance changes, and articulation) requires the concerted effort of a vast network of brain regions. However, singers are usually unaware of the neural networks that govern and coordinate all of these muscle groups, or what happens in these networks when auditory or somatosensory feedback notifies the singer of vocal errors, or if feedback is compromised even temporarily. In this chapter, the authors attempt to define the basic neural networks involved in singing, discuss how these networks may change due to extensive vocal training and practice, and present recent findings that illustrate how the networks respond to alterations to auditory and kinesthetic feedback.

Download Full-text

Towards a high quality Arabic speech synthesis system based on neural networks and residual excited vocal tract model

Signal Image and Video Processing ◽

10.1007/s11760-007-0038-z ◽

2007 ◽

Vol 2 (1) ◽

pp. 73-87 ◽

Cited By ~ 6

Author(s):

Fatima Chouireb ◽

Mhania Guerti

Keyword(s):

Neural Networks ◽

Speech Synthesis ◽

Vocal Tract ◽

Synthesis System ◽

High Quality

Download Full-text

Cineradiographic Examination of Articulatory Movement of Pseudo-Tongue, Hyoid, and Mandible in Congenital Aglossia

Communication Disorders Quarterly ◽

10.1177/1525740114523310 ◽

2014 ◽

Vol 36 (1) ◽

pp. 3-11 ◽

Cited By ~ 3

Author(s):

Betty McMicken ◽

Margaret Vento-Wilson ◽

Shelley Von Berg ◽

Kelly Rogers

Keyword(s):

Vocal Tract ◽

Multiple Regression Model ◽

Tongue Base ◽

Mandibular Movement ◽

Articulatory Movements ◽

Articulatory Movement ◽

Hyoid Movement ◽

Highly Correlated ◽

Variable Movement ◽

Pearson Correlations

This research examined cineradiographic films (CRF) of articulatory movements in a person with congenital aglossia (PWCA) during speech production of four phrases. Pearson correlations and a multiple regression model investigated co-variation of independent variables, positions of mandible and hyoid; and pseudo-tongue-dependent variables, positions of mylohyoid and tongue base. Results suggest that backing/fronting of the mandible assisted the mylohyoid/tongue base in making mid-antero-posterior constrictions. Co-linearity findings suggest the best predictor of tongue base movement was mandible for back sounds. Hyoid movement was highly correlated with mandibular movement horizontally, but hyoid acted independently vertically and possibly with greater phonemic specialty in the PWCA. Findings suggest hyoid was a strong determinant of vertically dependent variable movement in all phrases. The extent of hyoid activity was a unique finding and one that may begin to explain relative intelligibility in this PWCA. Observed changes in vocal tract length may have influenced F2 transitional/vowel midpoint values.

Download Full-text

Estimation of Vocal Tract Resonances Using Spectral Prominent Regions and Artificial Neural Networks

Circuits Systems and Signal Processing ◽

10.1007/s00034-018-0808-6 ◽

2018 ◽

Vol 37 (11) ◽

pp. 5087-5100

Author(s):

Vasantha Sama Sai ◽

Suryakanth V. Gangashetty ◽

Ashraf Alkhairy ◽

Afshan Jafri

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Vocal Tract ◽

Artificial Neural

Download Full-text

Articulatory Abnormalities in Athetoid Cerebral Palsy

Journal of Speech and Hearing Disorders ◽

10.1044/jshd.4303.353 ◽

1978 ◽

Vol 43 (3) ◽

pp. 353-373 ◽

Cited By ~ 38

Author(s):

Raymond Kent ◽

Ronald Netsell

Keyword(s):

Cerebral Palsy ◽

Motor Learning ◽

Vocal Tract ◽

Tongue Movement ◽

Lower Lip ◽

Jaw Movement ◽

Athetoid Cerebral Palsy ◽

Articulatory Movements ◽

Velopharyngeal Closure ◽

Transition Times

This report presents cinefluorographic data on the articulation of isolated vowels, VCV nonsense utterances, and short sentences by five subjects with athetoid cerebral palsy. Articulatory abnormalities were identified from tracings of vocal tract shapes and from displacement-by-time plots of articulatory events. The most frequent abnormalities were large ranges of jaw movement, inappropriate positioning of the tongue for various phonetic segments (especially because of a reduced range of tongue movement in the anteroposterior dimension), intermittency of velopharyngeal closure caused by an instability of velar elevation, prolonged transition times for articulatory movements, and retrusion of the lower lip. The speech disorder associated with athetosis is considered with respect to a model of motor learning.

Download Full-text

Vocal tract modelling with recurrent neural networks

1995 International Conference on Acoustics, Speech, and Signal Processing ◽

10.1109/icassp.1995.479694 ◽

2002 ◽

Cited By ~ 1

Author(s):

T.L. Burrows ◽

M. Niranjan

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Vocal Tract

Download Full-text

Vocal size exaggeration may have contributed to the origins of vocalic complexity

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2020.0401 ◽

2021 ◽

Vol 377 (1841) ◽

Author(s):

Katarzyna Pisanski ◽

Andrey Anikin ◽

David Reby

Keyword(s):

Body Size ◽

Social Impact ◽

Vocal Communication ◽

Vocal Tract ◽

Similar Extent ◽

Theme Issue ◽

Articulatory Movements ◽

Animal Vocalizations ◽

Second Formant ◽

Voice Modulation

Vocal tract elongation, which uniformly lowers vocal tract resonances (formant frequencies) in animal vocalizations, has evolved independently in several vertebrate groups as a means for vocalizers to exaggerate their apparent body size. Here, we propose that smaller speech-like articulatory movements that alter only individual formants can serve a similar yet less energetically costly size-exaggerating function. To test this, we examine whether uneven formant spacing alters the perceived body size of vocalizers in synthesized human vowels and animal calls. Among six synthetic vowel patterns, those characterized by the lowest first and second formant (the vowel /u/ as in ‘boot’) are consistently perceived as produced by the largest vocalizer. Crucially, lowering only one or two formants in animal-like calls also conveys the impression of a larger body size, and lowering the second and third formants simultaneously exaggerates perceived size to a similar extent as rescaling all formants. As the articulatory movements required for individual formant shifts are minor compared to full vocal tract extension, they represent a rapid and energetically efficient mechanism for acoustic size exaggeration. We suggest that, by favouring the evolution of uneven formant patterns in vocal communication, this deceptive strategy may have contributed to the origins of the phonemic diversification required for articulated speech. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part II)’.

Download Full-text

Vocal‐tract length normalization for acoustic‐to‐articulatory mapping using neural networks

The Journal of the Acoustical Society of America ◽

10.1121/1.427279 ◽

1999 ◽

Vol 106 (4) ◽

pp. 2181-2181 ◽

Cited By ~ 4

Author(s):

Sorin Dusan ◽

Li Deng

Keyword(s):

Neural Networks ◽

Vocal Tract ◽

Tract Length ◽

Vocal Tract Length Normalization

Download Full-text

Speed–Curvature Relations in Speech Production Challenge the 1/3 Power Law

Journal of Neurophysiology ◽

10.1152/jn.01116.2007 ◽

2008 ◽

Vol 100 (3) ◽

pp. 1171-1183 ◽

Cited By ~ 19

Author(s):

Pascal Perrier ◽

Susanne Fuchs

Keyword(s):

Speech Production ◽

Power Law ◽

Vocal Tract ◽

Tangential Velocity ◽

Speaking Rate ◽

Articulatory Movements ◽

Good Account ◽

Tongue Movements ◽

Complete Set ◽

Speech Kinematics

Relations between tangential velocity and trajectory curvature are analyzed for tongue movements during speech production in the framework of the 1/3 power law, discovered by Viviani and colleagues for arm movements. In 2004, Tasko and Westbury found for American English that the power function provides a good account of speech kinematics, but with an exponent that varies across articulators. The present work aims at broadening Tasko and Westbury's study 1) by analyzing speed–curvature relations for various languages (French, German, Mandarin) and for a biomechanical tongue model simulating speech gestures at various speaking rates and 2) by providing for each speaker or each simulated speaking rate a comparison of results found for the complete set of movements with those found for each movement separately. It is found that the 1/3 power law offers a fair description of the global speed–curvature relations for all speakers and all languages, when articulatory speech data are considered in their whole. This is also observed in the simulations, where the motor control model does not specify any kinematic property of the articulatory paths. However, the refined analysis for individual movements reveals numerous exceptions to this law: the velocity always decreases when curvature increases, but the slope in the log–log representation is variable. It is concluded that the speed–curvature relation is not controlled in speech movements and that it accounts only for general properties of the articulatory movements, which could arise from vocal tract dynamics or/and from stochastic characteristics of the measured signals.

Download Full-text

Gestural representation and Lexical Phonology

Phonology ◽

10.1017/s0952675700001974 ◽

1994 ◽

Vol 11 (2) ◽

pp. 277-316 ◽

Cited By ~ 21

Author(s):

April McMahon ◽

Paul Foulkes ◽

Laura Tollfree

Keyword(s):

Recent Work ◽

Vocal Tract ◽

Articulatory Phonology ◽

Units Of Analysis ◽

Articulatory Movements ◽

Lexical Phonology

Recent work on Articulatory Phonology (Browman & Goldstein 1986, 1989, 1991, 1992a, b) raises a number of questions, specifically involving the phonetics–phonology ‘interface’. One advantage of using Articulatory Phonology (henceforth ArtP), with its basic units of abstract gestures based on articulatory movements, is its ability to link phenomena previously seen as phonological to those which are conventionally described as allophonic, or even lower-level phonetic effects, since ‘gestures are... useful primitives for characterising phonological patterns as well as for analysing the activity of the vocal tract articulators’ (Browman & Goldstein 1991: 313). If both phonetics and phonology could ultimately be cast entirely in gestural terms, the phonetics–phonology interface might effectively cease to exist, at least in terms of units of analysis.

Download Full-text