Stream segregation of concurrent speech and the verbal transformation effect: Influence of fundamental frequency and lateralization cues

Marcin Stachurski; Robert J. Summers; Brian Roberts

doi:10.1121/1.4877268

Stream segregation of concurrent speech and the verbal transformation effect: Influence of fundamental frequency and lateralization cues

Hearing Research ◽

10.1016/j.heares.2017.07.016 ◽

2017 ◽

Vol 354 ◽

pp. 16-27 ◽

Cited By ~ 2

Author(s):

Marcin Stachurski ◽

Robert J. Summers ◽

Brian Roberts

Keyword(s):

Fundamental Frequency ◽

Stream Segregation ◽

Verbal Transformation

Download Full-text

Sequential stream segregation of voiced and unvoiced speech sounds based on fundamental frequency

Hearing Research ◽

10.1016/j.heares.2016.11.016 ◽

2017 ◽

Vol 344 ◽

pp. 235-243 ◽

Cited By ~ 8

Author(s):

Marion David ◽

Mathieu Lavandier ◽

Nicolas Grimault ◽

Andrew J. Oxenham

Keyword(s):

Fundamental Frequency ◽

Stream Segregation ◽

Speech Sounds

Download Full-text

Decoding of Envelope vs. Fundamental Frequency During Complex Auditory Stream Segregation

Neurobiology of Language ◽

10.1162/nol_a_00013 ◽

2020 ◽

Vol 1 (3) ◽

pp. 268-287

Author(s):

Keelin M. Greenlaw ◽

Sebastian Puschmann ◽

Emily B. J. Coffey

Keyword(s):

Fundamental Frequency ◽

Single Channel ◽

Stream Segregation ◽

Neural Representation ◽

Auditory Information ◽

Spatial Cues ◽

Auditory Stream ◽

Auditory Stream Segregation ◽

Hearing In Noise ◽

Human Function

Hearing-in-noise perception is a challenging task that is critical to human function, but how the brain accomplishes it is not well understood. A candidate mechanism proposes that the neural representation of an attended auditory stream is enhanced relative to background sound via a combination of bottom-up and top-down mechanisms. To date, few studies have compared neural representation and its task-related enhancement across frequency bands that carry different auditory information, such as a sound’s amplitude envelope (i.e., syllabic rate or rhythm; 1–9 Hz), and the fundamental frequency of periodic stimuli (i.e., pitch; >40 Hz). Furthermore, hearing-in-noise in the real world is frequently both messier and richer than the majority of tasks used in its study. In the present study, we use continuous sound excerpts that simultaneously offer predictive, visual, and spatial cues to help listeners separate the target from four acoustically similar simultaneously presented sound streams. We show that while both lower and higher frequency information about the entire sound stream is represented in the brain’s response, the to-be-attended sound stream is strongly enhanced only in the slower, lower frequency sound representations. These results are consistent with the hypothesis that attended sound representations are strengthened progressively at higher level, later processing stages, and that the interaction of multiple brain systems can aid in this process. Our findings contribute to our understanding of auditory stream separation in difficult, naturalistic listening conditions and demonstrate that pitch and envelope information can be decoded from single-channel EEG data.

Download Full-text

No interaction between fundamental-frequency differences and spectral region when perceiving speech in a speech background

PLoS ONE ◽

10.1371/journal.pone.0249654 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249654

Author(s):

Sara M. K. Madsen ◽

Torsten Dau ◽

Andrew J. Oxenham

Keyword(s):

Fundamental Frequency ◽

Spectral Region ◽

Speech Intelligibility ◽

Stream Segregation ◽

Pitch Discrimination ◽

Pitch Accuracy ◽

Harmonic Complex ◽

Sentence Recognition ◽

Spectrally Resolved ◽

Phase Relationships

Differences in fundamental frequency (F0) or pitch between competing voices facilitate our ability to segregate a target voice from interferers, thereby enhancing speech intelligibility. Although lower-numbered harmonics elicit a stronger and more accurate pitch sensation than higher-numbered harmonics, it is unclear whether the stronger pitch leads to an increased benefit of pitch differences when segregating competing talkers. To answer this question, sentence recognition was tested in young normal-hearing listeners in the presence of a single competing talker. The stimuli were presented in a broadband condition or were highpass or lowpass filtered to manipulate the pitch accuracy of the voicing, while maintaining roughly equal speech intelligibility in the highpass and lowpass regions. Performance was measured with average F0 differences (ΔF0) between the target and single-talker masker of 0, 2, and 4 semitones. Pitch discrimination abilities were also measured to confirm that the lowpass-filtered stimuli elicited greater pitch accuracy than the highpass-filtered stimuli. No interaction was found between filter type and ΔF0 in the sentence recognition task, suggesting little or no effect of harmonic rank or pitch accuracy on the ability to use F0 to segregate natural voices, even when the average ΔF0 is relatively small. The results suggest that listeners are able to obtain some benefit of pitch differences between competing voices, even when pitch salience and accuracy is low. The accuracy with which we are able to discriminate the pitch of a harmonic complex tone depends on the F0 and the harmonic numbers present. For F0s in the average range of speech (100–200 Hz), pitch discrimination is best (implying accurate F0 coding) when harmonics below about the 10th are present [6–10]. When these lower-numbered harmonics are present, pitch discrimination is also independent of the phase relationships between the harmonics, suggesting that these harmonics are spectrally resolved to some extent. In contrast, when only harmonics above the 10th are present in this range of F0s, pitch discrimination is poorer and is affected by the phase relationships between harmonics, suggesting that interactions occur between these spectrally unresolved harmonics [6–10]. Psychoacoustic studies of sound segregation have often been carried out with interleaved sequences of tones. Some of these studies have investigated segregation based on differences in pitch accuracy and have varied the accuracy by systematically varying whether resolved or only unresolved harmonics are present. Previous studies have found that stream segregation can occur with alternating sequences of tones, even if the tones consist only of unresolved harmonics [11–14]. However, the question of whether streaming is greater with resolved than unresolved harmonics has received mixed answers. In cases where the listeners’ task was to segregate the streams, some studies have shown little difference in streaming between conditions containing resolved or only unresolved harmonics [11, 15], whereas another study using a similar approach found significantly greater stream segregation when resolved harmonics were present than when only unresolved harmonics were present [12]. However, in situations where the task was either neutral or encouraged listeners to integrate the sequences into a single stream, the results have been consistent across studies in showing greater segregation for complex tones containing resolved harmonics than for tones containing only unresolved harmonics [13, 14]. These findings support the idea that pitch accuracy can affect our ability to segregate sounds. Less is known about the role of low-numbered harmonics in the context of segregating competing speech. Bird and Darwin [2] showed that lower harmonics dominate performance in a speech-segregation task based on F0 differences, but they did not test any conditions containing only high-numbered harmonics. Oxenham and Simonson [16] explored the effect of harmonic rank on speech intelligibility by comparing conditions where the target and single-talker masker had been lowpass (LP) or highpass (HP) filtered to either retain (LP-filtered) or remove (HP-filtered) the spectrally resolved components from the target and masker [16]. The LP and HP cutoff frequencies were selected to produce roughly equal performance in noise for both conditions. Surprisingly, performance in the LP and HP conditions improved by similar amounts when the noise masker was replaced by a single-talker masker with a different average F0, suggesting no clear benefit of having resolved harmonic components in the speech. However, that study only used relatively large values of average ΔF0 that according to recent F0 estimates were approximately 4 and 8 semitones (ST). Moreover, this study did not parametrically vary the ΔF0 between the target and masker. It may be that pitch accuracy is only relevant for more challenging conditions, i.e. for conditions with smaller average values of ΔF0. Thus, it remains unclear whether the effect of ΔF0 on performance is affected by the presence or absence of low-numbered, spectrally resolved harmonics. The aim of the present study was to determine whether there is an effect of spectral region, and hence pitch coding accuracy, on the ability of listeners to use average F0 differences between a target and an interfering talker to understand natural speech.

Download Full-text

Primitive stream segregation of tone sequences without differences in fundamental frequency or passband

The Journal of the Acoustical Society of America ◽

10.1121/1.1508784 ◽

2002 ◽

Vol 112 (5) ◽

pp. 2074-2085 ◽

Cited By ~ 96

Author(s):

Brian Roberts ◽

Brian R. Glasberg ◽

Brian C. J. Moore

Keyword(s):

Fundamental Frequency ◽

Stream Segregation

Download Full-text

Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention

10.1101/2021.12.03.471122 ◽

2021 ◽

Author(s):

Christian Brodbeck ◽

Jonathan Z. Simon

Keyword(s):

Selective Attention ◽

Fundamental Frequency ◽

Right Hemisphere ◽

Stream Segregation ◽

Voice Pitch ◽

Auditory Stream Segregation ◽

Pitch Tracking ◽

Relative Value ◽

The Right ◽

Task Irrelevant

AbstractVoice pitch carries linguistic as well as non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background, or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous EEG and ECoG results. The response tracked both the presence of pitch as well as the relative value of the speaker’s fundamental frequency. In the two-talker mixture, pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitch of the unattended speaker, and only during intervals in which no pitch was present in the attended talker’s speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.

Download Full-text

A Tape Striation Counting Method for Determining Fundamental Frequency

Language Speech and Hearing Services in Schools ◽

10.1044/0161-1461.1004.246 ◽

1979 ◽

Vol 10 (4) ◽

pp. 246-248 ◽

Cited By ~ 2

Author(s):

Peter B. Mueller ◽

Marla Adams ◽

Jean Baehr-Rouse ◽

Debbie Boos

Keyword(s):

Fundamental Frequency ◽

Counting Method ◽

Male And Female ◽

Fundamental Frequencies ◽

Counting Procedure ◽

Female Subjects

Mean fundamental frequencies of male and female subjects obtained with FLORIDA I and a tape striation counting procedure were compared. The fundamental frequencies obtained with these two methods were similar and it appears that the tape striation counting procedure is a viable, simple, and inexpensive alternative to more costly and complicated procedures and instrumentation.

Download Full-text

The Application of Laboratory Formulas to Clinical Voice Management

American Journal of Speech-Language Pathology ◽

10.1044/1058-0360.0402.62 ◽

1995 ◽

Vol 4 (2) ◽

pp. 62-69 ◽

Cited By ~ 12

Author(s):

Katherine Verdolini ◽

Ingo R. Titze

Keyword(s):

Fundamental Frequency ◽

Voice Disorders ◽

Clinical Interventions ◽

Case Examples ◽

Interactive Nature ◽

Mathematical Formulas

In this paper, we discuss the application of mathematical formulas to guide the development of clinical interventions in voice disorders. Discussion of case examples includes fundamental frequency and intensity deviations, pitch and loudness abnormalities, laryngeal hyperand hypoadduction, and phonatory effort. The paper illustrates the interactive nature of theoretical and applied work in vocology

Download Full-text

Contributions of Voice and Nonverbal Communication to Perceived Masculinity–Femininity for Cisgender and Transgender Communicators

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-19-00387 ◽

2020 ◽

Vol 63 (4) ◽

pp. 931-947

Author(s):

Teresa L. D. Hardy ◽

Carol A. Boliek ◽

Daniel Aalto ◽

Justin Lewicke ◽

Kristopher Wells ◽

...

Keyword(s):

Fundamental Frequency ◽

Sound Pressure Level ◽

Sound Pressure ◽

Vocal Tract ◽

Presentation Mode ◽

Pressure Level ◽

Presentation Modes ◽

Audiovisual Stimuli ◽

Vocal Tract Resonance ◽

Point Light

Purpose The purpose of this study was twofold: (a) to identify a set of communication-based predictors (including both acoustic and gestural variables) of masculinity–femininity ratings and (b) to explore differences in ratings between audio and audiovisual presentation modes for transgender and cisgender communicators. Method The voices and gestures of a group of cisgender men and women ( n = 10 of each) and transgender women ( n = 20) communicators were recorded while they recounted the story of a cartoon using acoustic and motion capture recording systems. A total of 17 acoustic and gestural variables were measured from these recordings. A group of observers ( n = 20) rated each communicator's masculinity–femininity based on 30- to 45-s samples of the cartoon description presented in three modes: audio, visual, and audio visual. Visual and audiovisual stimuli contained point light displays standardized for size. Ratings were made using a direct magnitude estimation scale without modulus. Communication-based predictors of masculinity–femininity ratings were identified using multiple regression, and analysis of variance was used to determine the effect of presentation mode on perceptual ratings. Results Fundamental frequency, average vowel formant, and sound pressure level were identified as significant predictors of masculinity–femininity ratings for these communicators. Communicators were rated significantly more feminine in the audio than the audiovisual mode and unreliably in the visual-only mode. Conclusions Both study purposes were met. Results support continued emphasis on fundamental frequency and vocal tract resonance in voice and communication modification training with transgender individuals and provide evidence for the potential benefit of modifying sound pressure level, especially when a masculine presentation is desired.

Download Full-text

Effects of Fundamental Frequency Contours on Sentence Recognition in Mandarin-Speaking Children With Cochlear Implants

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-20-00033 ◽

2020 ◽

Vol 63 (11) ◽

pp. 3855-3864

Author(s):

Wanting Huang ◽

Lena L. N. Wong ◽

Fei Chen ◽

Haihong Liu ◽

Wei Liang

Keyword(s):

Cochlear Implants ◽

Fundamental Frequency ◽

Signal To Noise Ratio ◽

Lexical Tone ◽

Signal To Noise ◽

Sentence Recognition ◽

Test Conditions ◽

Age Appropriate ◽

F0 Contour ◽

Appropriate Sentences

Purpose Fundamental frequency (F0) is the primary acoustic cue for lexical tone perception in tonal languages but is processed in a limited way in cochlear implant (CI) systems. The aim of this study was to evaluate the importance of F0 contours in sentence recognition in Mandarin-speaking children with CIs and find out whether it is similar to/different from that in age-matched normal-hearing (NH) peers. Method Age-appropriate sentences, with F0 contours manipulated to be either natural or flattened, were randomly presented to preschool children with CIs and their age-matched peers with NH under three test conditions: in quiet, in white noise, and with competing sentences at 0 dB signal-to-noise ratio. Results The neutralization of F0 contours resulted in a significant reduction in sentence recognition. While this was seen only in noise conditions among NH children, it was observed throughout all test conditions among children with CIs. Moreover, the F0 contour-induced accuracy reduction ratios (i.e., the reduction in sentence recognition resulting from the neutralization of F0 contours compared to the normal F0 condition) were significantly greater in children with CIs than in NH children in all test conditions. Conclusions F0 contours play a major role in sentence recognition in both quiet and noise among pediatric implantees, and the contribution of the F0 contour is even more salient than that in age-matched NH children. These results also suggest that there may be differences between children with CIs and NH children in how F0 contours are processed.

Download Full-text