Video-realistic synthetic speech with a parametric visual speech synthesizer

Author(s):  
Sascha Fagel
2001 ◽  
Author(s):  
Jay J. Williams ◽  
Aggelos K. Katsaggelos ◽  
Dean C. Garstecki

1990 ◽  
Vol 55 (1) ◽  
pp. 21-27 ◽  
Author(s):  
Linda M. Huntress ◽  
Linda Lee ◽  
Nancy A. Creaghead ◽  
Daniel D. Wheeler ◽  
Kathleen M. Braverman

This study investigated the ability of aphasic patients with mild auditory comprehension problems to respond to synthetic speech produced by an inexpensive speech synthesizer attached to a personal computer. Subjects were given four practice sessions with synthetic speech; testing of synthetic speech comprehension was performed during Sessions 1 and 4. During testing, aphasic subjects' comprehension of synthetic speech was compared with their comprehension of natural speech on four tasks: (a) picture identification, (b) following commands, (c) yes/no questions, and (d) paragraph comprehension with yes/no questions. Aphasic subjects comprehended natural speech better than synthetic speech in Session 1 but not in Session 4. Their synthetic speech scores improved between Sessions 1 and 4. There was also a significant difference among scores on the four tasks for both sessions. The means for picture identification were highest, followed by yes/no questions, commands, and finally paragraph comprehension for both sessions. Although performance by some subjects on some tasks was accurate enough to indicate that an inexpensive speech synthesizer could be a useful tool for working with mild aphasic patients, considerable caution in selecting both tasks and patients is warranted.


1990 ◽  
Vol 55 (4) ◽  
pp. 751-755 ◽  
Author(s):  
Kathleen A. Kangas ◽  
George D. Allen

This study examined the perceived intelligibility of synthetic speech. Participants were adults aged 49–69, one group with normal hearing and one group with acquired sensorineural hearing impairment. Word lists were presented in two speech types: DECtalk (a high-quality speech synthesizer) and a natural male speaker. Results revealed differences between groups, with normal-hearing listeners scoring higher than hearing-impaired listeners, and between speech types, with higher scores for natural speech than for synthesized speech. There was no significant interaction of hearing level and speech type.


1987 ◽  
Vol 9 (1) ◽  
pp. 38-44 ◽  
Author(s):  
Marjorie Helsel-Dewert ◽  
Maud van den Meiracker

This study investigated the effect of listening experience on the ability of learning handicapped children to understand synthetic speech. Two groups of 12 learning handicapped children each participated in the study. The first group was exposed to synthetic speech every day for 10 weeks, whereas the second had limited or no exposure to synthetic speech. When children were asked to identify stimulus words presented by a taped, female voice, the group with extended exposure to synthetic speech correctly identified approximately the same percentage of words as the group with limited exposure In contrast, when the stimulus words were presented by a speech synthesizer, children in the extended exposure group correctly identified a significantly higher percentage of the words than did those in the limited exposure group. Both groups correctly identified significantly fewer wends when words were presented by the speech synthesizer than when words were presented by the tape recorder. Educational implications of the low intelligibility scores of both groups when listening to synthesized speech are discussed, and future research needs identified.


2001 ◽  
Vol 01 (01) ◽  
pp. 19-26 ◽  
Author(s):  
PENGYU HONG ◽  
ZHEN WEN ◽  
THOMAS S. HUANG

We present the iFACE system, a visual speech synthesizer that provides a form of virtual face-to-face communication. The system provides an interactive tool for the user to customize a graphic head model for the virtual agent of a person based on his/her range data. The texture is mapped onto the customized model to achieve a realistic appearance. Face animations are produced by using text stream or speech stream to drive the model. A set of basic facial shapes and head action is manually built and used to synthesize expressive visual speech based on rules.


Sign in / Sign up

Export Citation Format

Share Document