Spontaneous speech synthesis by pronunciation variant selection - a comparison to natural speech

Pronunciation Variant Selection for Spontaneous Speech Synthesis-Listening Effort As a Quality Parameter

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings ◽

10.1109/icassp.2006.1660156 ◽

2006 ◽

Author(s):

S. Werner ◽

M. Wolff ◽

R. Hoffmann

Keyword(s):

Speech Synthesis ◽

Spontaneous Speech ◽

Quality Parameter ◽

Variant Selection ◽

Listening Effort ◽

Selection For

Download Full-text

Toward Spontaneous Speech Synthesis—Utilizing Language Model Information in TTS

IEEE Transactions on Speech and Audio Processing ◽

10.1109/tsa.2004.828635 ◽

2004 ◽

Vol 12 (4) ◽

pp. 436-445 ◽

Cited By ~ 13

Author(s):

S. Werner ◽

M. Eichner ◽

M. Wolff ◽

R. Hoffmann

Keyword(s):

Speech Synthesis ◽

Language Model ◽

Spontaneous Speech

Download Full-text

Toward hidden Markov model‐based spontaneous speech synthesis

The Journal of the Acoustical Society of America ◽

10.1121/1.4787189 ◽

2006 ◽

Vol 120 (5) ◽

pp. 3037-3038

Author(s):

Tatsuya Akagawa ◽

Koji Iwano ◽

Sadaoki Furui

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Speech Synthesis ◽

Hidden Markov ◽

Spontaneous Speech ◽

Model Based

Download Full-text

Searching for standard French: The construction and mining of the Recueil historique des grammaires du français

Journal of Historical Sociolinguistics ◽

10.1515/jhsl-2015-0002 ◽

2015 ◽

Vol 1 (1) ◽

pp. 13-55 ◽

Cited By ~ 6

Author(s):

Shana Poplack ◽

Lidia-Gabriela Jarmasz ◽

Nathalie Dion ◽

Nicole Rosen

Keyword(s):

Nineteenth Century ◽

Spontaneous Speech ◽

Spoken Language ◽

Variant Selection ◽

Systematic Analysis ◽

Form Function ◽

Time Period ◽

Quebec French ◽

Contextual Elements ◽

Meta Analyses

AbstractThis paper describes a massive project to characterize “Standard French” by constructing and mining the Recueil historique des grammaires du français (RHGF), a corpus of grammars whose prescriptive dictates we interpret as representing the evolution of the standard over five centuries. Its originality lies in the possibility it affords to ascertain the existence of prior variability, date it, and determine the conditions under which grammarians accept or condemn variant uses. Systematic meta-analyses of the RHGF reveal that grammarians rarely acknowledge the existence of alternate ways of expressing the same thing. Instead, they adopt three major strategies to establish form-function symmetry. All involve partitioning competing variants across distinct social, semantic or linguistic contexts, despite pervasive disagreement over which variant to associate with which. This effectively factors out variability. In contrast, systematic analysis of actual language use, as instantiated in the spontaneous speech of 323 speakers of Quebec French over an apparent-time period of a century and a half, reveals robust variability, regularly conditioned by contextual elements which have never been acknowledged by grammarians. This conditioning has remained largely stable since at least the mid-nineteenth century. Taken together, these results indicate that the “rules” for variant selection promulgated by grammarians do not inform the spoken language, nor do grammars take account of the variable rules structuring spontaneous speech. As a result, grammar and usage are evolving independently.

Download Full-text

Personalized natural speech synthesis based on retrieval of pitch patterns using hierarchical Fujisaki model

2013 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2013.6639191 ◽

2013 ◽

Author(s):

Yi-Chin Huang ◽

Chung-Hsien Wu ◽

Shih-Lun Lin

Keyword(s):

Speech Synthesis ◽

Natural Speech ◽

Fujisaki Model ◽

Pitch Patterns

Download Full-text

LSTM Deep Neural Networks Postfiltering for Enhancing Synthetic Voices

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141860008x ◽

2017 ◽

Vol 32 (01) ◽

pp. 1860008 ◽

Cited By ~ 8

Author(s):

Marvin Coto-Jiménez ◽

John Goddard-Close

Keyword(s):

Neural Networks ◽

Speech Synthesis ◽

Deep Neural Networks ◽

Short Term Memory ◽

Markov Models ◽

Natural Speech ◽

Objective Measures ◽

Recent Developments ◽

Small Footprint ◽

Synthetic Voices

Recent developments in speech synthesis have produced systems capable of producing speech which closely resembles natural speech, and researchers now strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents. Speech synthesis based on Hidden Markov Models (HMM) is of great interest to researchers, due to its ability to produce sophisticated features with a small footprint. Despite some progress, its quality has not yet reached the level of the current predominant unit-selection approaches, which select and concatenate recordings of real speech, and work has been conducted to try to improve HMM-based systems. In this paper, we present an application of long short-term memory (LSTM) deep neural networks as a postfiltering step in HMM-based speech synthesis. Our motivation stems from a similar desire to obtain characteristics which are closer to those of natural speech. The paper analyzes four types of postfilters obtained using five voices, which range from a single postfilter to enhance all the parameters, to a multi-stream proposal which separately enhances groups of parameters. The different proposals are evaluated using three objective measures and are statistically compared to determine any significance between them. The results described in the paper indicate that HMM-based voices can be enhanced using this approach, specially for the multi-stream postfilters on the considered objective measures.

Download Full-text

Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2016.7472660 ◽

2016 ◽

Cited By ~ 3

Author(s):

Rasmus Dall ◽

Sandrine Brognaux ◽

Korin Richmond ◽

Cassia Valentini-Botinhao ◽

Gustav Eje Henter ◽

...

Keyword(s):

Speech Synthesis ◽

Spontaneous Speech

Download Full-text

Visually Impaired Persons’ Comprehension of Text Presented with Speech Synthesis

Journal of Visual Impairment & Blindness ◽

10.1177/0145482x9208601005 ◽

1992 ◽

Vol 86 (10) ◽

pp. 426-428 ◽

Cited By ~ 2

Author(s):

E. Hjelmquist ◽

U. Dahlstrand; ◽

L. Hedelin

Keyword(s):

Visually Impaired ◽

Speech Synthesis ◽

Synthesis Condition ◽

Natural Speech ◽

Middle Aged ◽

Marginal Effects ◽

Visually Impaired Persons

Three groups of visually impaired persons (two middle aged and one old) were investigated with respect to memory and understanding of texts presented with speech synthesis and natural speech, respectively. The results showed that speech synthesis generally yielded lower results than did natural speech. Experience had no effect on performance, and there were only marginal effects related to age. However, there were big differences among the groups with respect to the presentation speed chosen in the speech-synthesis condition.

Download Full-text

The Intelligibility of Synthesized Speech

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3003.425 ◽

1987 ◽

Vol 30 (3) ◽

pp. 425-431 ◽

Cited By ~ 56

Author(s):

Julia Hoover ◽

Joe Reichle ◽

Dianne Van Tasell ◽

David Cole

Keyword(s):

Speech Synthesis ◽

Linear Trend ◽

Language Impairments ◽

Practice Effect ◽

Natural Speech ◽

Communication Aids ◽

Sentence Condition ◽

Synthesized Speech ◽

Preceding Context ◽

Low Probability

The intelligibility of two speech synthesizers [ECHO II (Street Electronics, 1982) and VOTRAX (VOTRAX Division, 1981)] was compared to the intelligibility of natural speech in each of three different contextual conditions: (a) single words, (b)"low-probability sentences" in which the last word could not be predicted from preceding context, and (c) "high-probability sentences" in which the last word could be predicted from preceding context. Additionally, the effect of practice on performance in each condition was examined. Natural speech was more intelligible than either type of synthesized speech regardless of word/sentence condition. In both sentence conditions, VOTRAX speech was significantly more intelligible than ECHO II speech. No practice effect was observed for VOTRAX, while an ascending linear trend occurred for ECHO II. Implications for the use of inexpensive speech synthesis units as components of augmentative communication aids for persons with severe speech and/or language impairments are discussed.

Download Full-text

Perception of smiling voice in spontaneous speech synthesis

10.21437/ssw.2021-19 ◽

2021 ◽

Author(s):

Ambika Kirkland ◽

Marcin Włodarczak ◽

Joakim Gustafson ◽

Eva Szekely

Keyword(s):

Speech Synthesis ◽

Spontaneous Speech

Download Full-text