Differences of speech rate, interphoneme distance and likelihood caused by speaking style, their relationship, and recognition performance

2002 ◽  
Vol 33 (7) ◽  
pp. 50-60 ◽  
Author(s):  
Kazumasa Yamamoto ◽  
Seiichi Nakagawa
1989 ◽  
Vol 33 (5) ◽  
pp. 301-304 ◽  
Author(s):  
Catalina M. Danis

This paper reports on a study of recognition performance for a group of new users during their first month of experience with the Tangora system. Tangora is a 20,000 word, speaker dependent, isolated-word system which transcribes speech input into text in real-time. Twelve users, six males and six females, participated in 21 sessions each, during which they read aloud unrelated sentences selected from a corpus of office correspondence. Their goal was to develop a speaking style which minimized Tangora's recognition error. To this end, starting with the third session, the experimenter generated hypotheses about each users' speech habits which may have resulted in high recognition error and made suggestions to the user on how to modify his/her speaking style. In addition, each user produced a new speech sample each of the four weeks of the experiment which was used to “train” the system to recognize the speaker. On average, recognition error decreased by 33% from the first to the fourth week. This improvement was attributable to “retraining” the system with, apparently, more representative speech samples. A number of speech habits brought by users to the recognition task were identified as contributing to poor recognition performance by Tangora. These included: (a) a too fast speech rate, (b) failure to pause between words, (c) hyper-correct articulation of the final phoneme in words and (d) incomplete articulation of the first phoneme in words. Feedback relating to these speech habits was used successfully by a majority of the users to modify their speaking style into one more successfully recognized by the Tangora system.


1997 ◽  
Vol 40 (2) ◽  
pp. 423-431 ◽  
Author(s):  
Sandra Gordon-Salant ◽  
Peter J. Fitzgibbons

The influence of selected cognitive factors on age-related changes in speech recognition was examined by measuring the effects of recall task, speech rate, and availability of contextual cues on recognition performance by young and elderly listeners. Stimuli were low and high context sentences from the R-SPIN test presented at normal and slowed speech rates in noise. Response modes were final word recall and sentence recall. The effects of hearing loss and age were examined by comparing performances of young and elderly listeners with normal hearing and young and elderly listeners with hearing loss. Listeners with hearing loss performed more poorly than listeners with normal hearing in nearly every condition. In addition, elderly listeners exhibited poorer performance than younger listeners on the sentence recall task, but not on the word recall task, indicating that added memory demands have a detrimental effect on elderly listeners' performance. Slowing of speech rate did not have a differential effect on performance of young and elderly listeners. All listeners performed well when stimulus contextual cues were available. Taken together, these results support the notion that the performance of elderly listeners with hearing loss is influenced by a combination of auditory processing factors, memory demands, and speech contextual information.


2021 ◽  
pp. 009365022110593
Author(s):  
Emma Rodero ◽  
Lucía Cores-Sarría

Studies in different languages have identified a broadcast speaking style, a particular manner that broadcasters have of reading news. This speaking style is characterized by an emphatic intonation with a fast speech rate easily recognizable by listeners. Some authors have stated that messages in this style are not positively perceived by listeners, as it is repetitive and regular, but there is no empirical data to support this conclusion, nor has the style been analyzed with physiological measures. The physiological approach has some advantages, such as a more objective assessment and real-time evaluation. Therefore, this study aims to analyze the effectiveness, adequacy, and physiological response of this broadcast style compared to a narrative pattern. We combined self-report with physiological measures. Fifty-six participants listened to six news pieces in both styles and with two voices, male and female. They had to rate the effectiveness and adequacy of the news while we measured their physiological responses (heart rate and electrodermal activity). The results showed that news conveyed through the broadcast style elicited less cognitive resource allocation and emotional arousal than the narrative pattern, but there were no significant differences in self-report evaluations.


1994 ◽  
Vol 9 ◽  
pp. 21-44 ◽  
Author(s):  
Maria-Josep Solé

Abstract. Synchronic and diachronic sound change may involve (1) the phonologization of an effect of phonetic implementation, or (2) the lexicalization of phonetic or phonogical processes. This paper seeks to determine the phonologization and lexicalization of phonetic and phonological effects on the basis of their behaviour across different speaking rates. To illustrate the phonologization of phonetic effects, cross-linguistic data on aspiration and vowel nasalization across different speech rates are presented. The data show that phonological effects adjust to variations in speech rate, so as to keep a constant perceptual distance across rates, whereas phonetic effects, which originate at a lower level, remain constant across rates or present rate-correlated changes which can be accounted for by the general principles of speech motor control. Speech rate might also allow us to distinguish between phonetic effects which do not involve a change in the underlying representation, and effects which have been lexicalized. Connected speech processes, such as assimilation, are known to depend on factors such as speaking rate and speaking style. Consequently, low level assimilatory processes are expected to show continuous variation with changes in rate, as a result of increased gestural overlap. On the contrary, if assimilatory processes have been lexicalized as a distinct lexical representation or as an alternative style-dependent form, then the lexicalized form will exhibit a rate-invariant pattern. A variety of experimental data which provide support for this new way of analyzing sound change is presented. It is argued that part of the synchronic variation in present-day speakers is due to sound change, i.e. a discrete, categorical change in the speaker's grammar.


2010 ◽  
Vol 20 (1) ◽  
pp. 20-25 ◽  
Author(s):  
Jim Tsiamtsiouris ◽  
Kim Krieger

Abstract The purpose of this study was to test the hypothesis that adults who stutter will exhibit significant improvements after attending a residential, 3-week intensive program that focuses on avoidance reduction and stuttering modification therapy. Preliminary analyses focused on four measures: (a) SSI-3, (b) speech rate, (c) S-24 Scale, and (d) OASES. Results indicated significant improvements on all of the measures.


1991 ◽  
Vol 34 (2) ◽  
pp. 415-426 ◽  
Author(s):  
Richard L. Freyman ◽  
G. Patrick Nerbonne ◽  
Heather A. Cote

This investigation examined the degree to which modification of the consonant-vowel (C-V) intensity ratio affected consonant recognition under conditions in which listeners were forced to rely more heavily on waveform envelope cues than on spectral cues. The stimuli were 22 vowel-consonant-vowel utterances, which had been mixed at six different signal-to-noise ratios with white noise that had been modulated by the speech waveform envelope. The resulting waveforms preserved the gross speech envelope shape, but spectral cues were limited by the white-noise masking. In a second stimulus set, the consonant portion of each utterance was amplified by 10 dB. Sixteen subjects with normal hearing listened to the unmodified stimuli, and 16 listened to the amplified-consonant stimuli. Recognition performance was reduced in the amplified-consonant condition for some consonants, presumably because waveform envelope cues had been distorted. However, for other consonants, especially the voiced stops, consonant amplification improved recognition. Patterns of errors were altered for several consonant groups, including some that showed only small changes in recognition scores. The results indicate that when spectral cues are compromised, nonlinear amplification can alter waveform envelope cues for consonant recognition.


Sign in / Sign up

Export Citation Format

Share Document