Reading Fluent Speech from Talking Faces: Typical Brain Networks and Individual Differences

Deborah A. Hall; Clayton Fussell; A. Quentin Summerfield

doi:10.1162/0898929054021175

Reading Fluent Speech from Talking Faces: Typical Brain Networks and Individual Differences

Journal of Cognitive Neuroscience ◽

10.1162/0898929054021175 ◽

2005 ◽

Vol 17 (6) ◽

pp. 939-953 ◽

Cited By ~ 71

Author(s):

Deborah A. Hall ◽

Clayton Fussell ◽

A. Quentin Summerfield

Keyword(s):

Individual Differences ◽

Speech Processing ◽

Superior Temporal Gyrus ◽

Brain Regions ◽

Receptive Language ◽

Visual Input ◽

Visual Speech ◽

Middle Temporal ◽

Blank Screen ◽

Auditory Speech

Listeners are able to extract important linguistic information by viewing the talker's face—a process known as “speechreading.” Previous studies of speechreading present small closed sets of simple words and their results indicate that visual speech processing engages a wide network of brain regions in the temporal, frontal, and parietal lobes that are likely to underlie multiple stages of the receptive language system. The present study further explored this network in a large group of subjects by presenting naturally spoken sentences which tap the richer complexities of visual speech processing. Four different baselines (blank screen, static face, nonlinguistic facial gurning, and auditory speech) enabled us to determine the hierarchy of neural processing involved in speechreading and to test the claim that visual input reliably accesses sound-based representations in the auditory cortex. In contrast to passively viewing a blank screen, the static-face condition evoked activation bilaterally across the border of the fusiform gyrus and cerebellum, and in the medial superior frontal gyrus and left precentral gyrus (p < .05, whole brain corrected). With the static face as baseline, the gurning face evoked bilateral activation in the motion-sensitive region of the occipital cortex, whereas visual speech additionally engaged the middle temporal gyrus, inferior and middle frontal gyri, and the inferior parietal lobe, particularly in the left hemisphere. These latter regions are implicated in lexical stages of spoken language processing. Although auditory speech generated extensive bilateral activation across both superior and middle temporal gyri, the group-averaged pattern of speechreading activation failed to include any auditory regions along the superior temporal gyrus, suggesting that fluent visual speech does not always involve sound-based coding of the visual input. An important finding from the individual subject analyses was that activation in the superior temporal gyrus did reach significance (p < .001, small-volume corrected) for a subset of the group. Moreover, the extent of the left-sided superior temporal gyrus activity was strongly correlated with speech-reading performance. Skilled speechreading was also associated with activations and deactivations in other brain regions, suggesting that individual differences reflect the efficiency of a circuit linking sensory, perceptual, memory, cognitive, and linguistic processes rather than the operation of a single component process.

Download Full-text

Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution

10.1101/2020.04.16.045716 ◽

2020 ◽

Author(s):

Brian A. Metzger ◽

John F. Magnotti ◽

Zhengjia Wang ◽

Elizabeth Nesbitt ◽

Patrick J. Karas ◽

...

Keyword(s):

Speech Perception ◽

Speech Processing ◽

Time Course ◽

Brain Area ◽

Superior Temporal Gyrus ◽

Visual Speech ◽

Audiovisual Speech ◽

Neural Responses ◽

Auditory Speech ◽

Human Epilepsy

AbstractExperimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained, positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory- only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.Significance statementUnderstanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography (iEEG) deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.

Download Full-text

Converging Evidence from Electrocorticography and BOLD fMRI for a Sharp Functional Boundary in Superior Temporal Gyrus Related to Multisensory Speech Processing

10.1101/272823 ◽

2018 ◽

Author(s):

Muge Ozker ◽

Michael S. Beauchamp

Keyword(s):

Speech Perception ◽

Speech Processing ◽

Recognition Task ◽

Superior Temporal Gyrus ◽

Visual Speech ◽

Audiovisual Speech ◽

Auditory Modality ◽

Bold Fmri ◽

Auditory Speech ◽

Functional Boundary

AbstractAlthough humans can understand speech using the auditory modality alone, in noisy environments visual speech information from the talker’s mouth can rescue otherwise unintelligible auditory speech. To investigate the neural substrates of multisensory speech perception, we recorded neural activity from the human superior temporal gyrus using two very different techniques: either directly, using surface electrodes implanted in five participants with epilepsy (electrocorticography, ECOG), or indirectly, using blood oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) in six healthy control fMRI participants. Both ECOG and fMRI participants viewed the same clear and noisy audiovisual speech stimuli and performed the same speech recognition task. Both techniques demonstrated a sharp functional boundary in the STG, which corresponded to an anatomical boundary defined by the posterior edge of Heschl’s gyrus. On the anterior side of the boundary, cortex responded more strongly to clear audiovisual speech than to noisy audiovisual speech, suggesting that anterior STG is primarily involved in processing unisensory auditory speech. On the posterior side of the boundary, cortex preferred noisy audiovisual speech or showed no preference and showed robust responses to auditory-only and visual-only speech, suggesting that posterior STG is specialized for processing multisensory audiovisual speech. For both ECOG and fMRI, the transition between the functionally distinct regions happened within 10 mm of anterior-to-posterior distance along the STG. We relate this boundary to the multisensory neural code underlying speech perception and propose that it represents an important functional division within the human speech perception network.

Download Full-text

Lip movements entrain the observers’ low-frequency brain oscillations to facilitate speech intelligibility

eLife ◽

10.7554/elife.14521 ◽

2016 ◽

Vol 5 ◽

Cited By ~ 65

Author(s):

Hyojin Park ◽

Christoph Kayser ◽

Gregor Thut ◽

Joachim Gross

Keyword(s):

Visual Cortex ◽

Speech Processing ◽

Speech Intelligibility ◽

Brain Activity ◽

Low Frequency ◽

Visual Speech ◽

Visual Signals ◽

Partial Coherence ◽

Auditory Speech ◽

Oscillatory Brain Activity

During continuous speech, lip movements provide visual temporal signals that facilitate speech processing. Here, using MEG we directly investigated how these visual signals interact with rhythmic brain activity in participants listening to and seeing the speaker. First, we investigated coherence between oscillatory brain activity and speaker’s lip movements and demonstrated significant entrainment in visual cortex. We then used partial coherence to remove contributions of the coherent auditory speech signal from the lip-brain coherence. Comparing this synchronization between different attention conditions revealed that attending visual speech enhances the coherence between activity in visual cortex and the speaker’s lips. Further, we identified a significant partial coherence between left motor cortex and lip movements and this partial coherence directly predicted comprehension accuracy. Our results emphasize the importance of visually entrained and attention-modulated rhythmic brain activity for the enhancement of audiovisual speech processing.

Download Full-text

Auditory speech processing is affected by visual speech in the periphery

10.21437/interspeech.2011-591 ◽

2011 ◽

Author(s):

Jeesun Kim ◽

Chris Davis

Keyword(s):

Speech Processing ◽

Visual Speech ◽

Auditory Speech

Download Full-text

Adaptive benefit of cross-modal plasticity following cochlear implantation in deaf adults

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1704785114 ◽

2017 ◽

Vol 114 (38) ◽

pp. 10256-10261 ◽

Cited By ~ 31

Author(s):

Carly A. Anderson ◽

Ian M. Wiggins ◽

Pádraig T. Kitterick ◽

Douglas E. H. Hartley

Keyword(s):

Cochlear Implantation ◽

Near Infrared ◽

Temporal Cortex ◽

Brain Regions ◽

Visual Language ◽

Visual Speech ◽

Repeated Testing ◽

Functional Near Infrared Spectroscopy ◽

Auditory Speech ◽

Deaf Adults

It has been suggested that visual language is maladaptive for hearing restoration with a cochlear implant (CI) due to cross-modal recruitment of auditory brain regions. Rehabilitative guidelines therefore discourage the use of visual language. However, neuroscientific understanding of cross-modal plasticity following cochlear implantation has been restricted due to incompatibility between established neuroimaging techniques and the surgically implanted electronic and magnetic components of the CI. As a solution to this problem, here we used functional near-infrared spectroscopy (fNIRS), a noninvasive optical neuroimaging method that is fully compatible with a CI and safe for repeated testing. The aim of this study was to examine cross-modal activation of auditory brain regions by visual speech from before to after implantation and its relation to CI success. Using fNIRS, we examined activation of superior temporal cortex to visual speech in the same profoundly deaf adults both before and 6 mo after implantation. Patients’ ability to understand auditory speech with their CI was also measured following 6 mo of CI use. Contrary to existing theory, the results demonstrate that increased cross-modal activation of auditory brain regions by visual speech from before to after implantation is associated with better speech understanding with a CI. Furthermore, activation of auditory cortex by visual and auditory speech developed in synchrony after implantation. Together these findings suggest that cross-modal plasticity by visual speech does not exert previously assumed maladaptive effects on CI success, but instead provides adaptive benefits to the restoration of hearing after implantation through an audiovisual mechanism.

Download Full-text

Detection of Sounds in the Auditory Stream: Event-Related fMRI Evidence for Differential Activation to Speech and Nonspeech

Journal of Cognitive Neuroscience ◽

10.1162/089892901753165890 ◽

2001 ◽

Vol 13 (7) ◽

pp. 994-1005 ◽

Cited By ~ 140

Author(s):

Athena Vouloumanos ◽

Kent A. Kiehl ◽

Janet F. Werker ◽

Peter F. Liddle

Keyword(s):

Speech Processing ◽

Inferior Frontal Gyrus ◽

Superior Temporal Gyrus ◽

Sine Wave ◽

Receptive Language ◽

Detection Task ◽

Auditory Stream ◽

Speech Stimuli ◽

Simple Detection ◽

The Right

The detection of speech in an auditory stream is a requisite first step in processing spoken language. In this study, we used event-related fMRI to investigate the neural substrates mediating detection of speech compared with that of nonspeech auditory stimuli. Unlike previous studies addressing this issue, we contrasted speech with nonspeech analogues that were matched along key temporal and spectral dimensions. In an oddball detection task, listeners heard nonsense speech sounds, matched sine wave analogues (complex nonspeech), or single tones (simple nonspeech). Speech stimuli elicited significantly greater activation than both complex and simple nonspeech stimuli in classic receptive language areas, namely the middle temporal gyri bilaterally and in a locus lateralized to the left posterior superior temporal gyrus. In addition, speech activated a small cluster of the right inferior frontal gyrus. The activation of these areas in a simple detection task, which requires neither identification nor linguistic analysis, suggests they play a fundamental role in speech processing.

Download Full-text

Auditory Processing Following Consecutive Right Temporal Lobe Resections: A Prospective Case Study

Journal of the American Academy of Audiology ◽

10.3766/jaaa.24.7.2 ◽

2013 ◽

Vol 24 (07) ◽

pp. 535-543 ◽

Cited By ~ 3

Author(s):

Stephanie Nagle ◽

Frank E. Musiek ◽

Eric H. Kossoff ◽

George Jallo ◽

Dana Boatman-Reich

Keyword(s):

Speech Recognition ◽

Temporal Lobe ◽

Real World ◽

Speech Processing ◽

Auditory Processing ◽

Background Noise ◽

Superior Temporal Gyrus ◽

Sound Recognition ◽

Auditory Speech ◽

The Right

Background: The role of the right temporal lobe in processing speech is not well understood. Although the left temporal lobe has long been recognized as critical for speech perception, there is growing evidence for right hemisphere involvement. To investigate whether the right temporal lobe is critical for auditory speech processing, we studied prospectively a normal-hearing patient who underwent consecutive right temporal lobe resections for treatment of medically intractable seizures. Purpose: To test the hypothesis that the right temporal lobe is critical for auditory speech processing. Research Design: We used a prospective, repeated-measure, single-case design. Auditory processing was evaluated using behavioral tests of speech recognition (words, sentences) under multiple listening conditions (e.g., quiet, background noise, etc.). Auditory processing of nonspeech sounds was measured by pitch pattern sequencing and environmental sound recognition tasks. Data Collection: Repeat behavioral testing was performed at four time points over a 2 yr period: before and after consecutive right temporal lobe resection surgeries. Results: Before surgery, the patient demonstrated normal speech recognition in quiet and under real-world listening conditions (background noise, filtered speech). After the initial right anterior temporal resection, speech recognition scores declined under adverse listening conditions, especially for the left ear, but remained largely within normal limits. Following resection of the right superior temporal gyrus 1 yr later, speech recognition in quiet and nonspeech sound processing (pitch patterns, environmental sounds) remained intact. However, speech recognition under adverse listening conditions was severely impaired. Conclusions: The right superior temporal gyrus appears to be critical for auditory processing of speech under real-world listening conditions.

Download Full-text

Visual speech form influences the speed of auditory speech processing

Brain and Language ◽

10.1016/j.bandl.2013.06.008 ◽

2013 ◽

Vol 126 (3) ◽

pp. 350-356 ◽

Cited By ~ 9

Author(s):

Tim Paris ◽

Jeesun Kim ◽

Chris Davis

Keyword(s):

Speech Processing ◽

Visual Speech ◽

Auditory Speech

Download Full-text

Effects of Alphabet-Supplemented Speech on Brain Activity of Listeners: An fMRI Study

Journal of Speech Language and Hearing Research ◽

10.1044/2015_jslhr-s-14-0038 ◽

2015 ◽

Vol 58 (5) ◽

pp. 1452-1463 ◽

Cited By ~ 1

Author(s):

Kelene Fercho ◽

Lee A. Baugh ◽

Elizabeth K. Hanson

Keyword(s):

Speech Processing ◽

Speech Intelligibility ◽

Brain Activity ◽

Superior Temporal Gyrus ◽

Superior Temporal Sulcus ◽

Brain Regions ◽

Audiovisual Integration ◽

Degraded Speech ◽

Fmri Study ◽

Integration Sites

Purpose The purpose of this article was to examine the neural mechanisms associated with increases in speech intelligibility brought about through alphabet supplementation. Method Neurotypical participants listened to dysarthric speech while watching an accompanying video of a hand pointing to the 1st letter spoken of each word on an alphabet display (treatment condition) or a scrambled display (control condition). Their hemodynamic response was measured with functional magnetic resonance imaging, using a sparse sampling event-related paradigm. Speech intelligibility was assessed via a forced-choice auditory identification task throughout the scanning session. Results Alphabet supplementation was associated with significant increases in speech intelligibility. Further, alphabet supplementation increased activation in brain regions known to be involved in both auditory speech and visual letter perception above that seen with the scrambled display. Significant increases in functional activity were observed within the posterior to mid superior temporal sulcus/superior temporal gyrus during alphabet supplementation, regions known to be involved in speech processing and audiovisual integration. Conclusion Alphabet supplementation is an effective tool for increasing the intelligibility of degraded speech and is associated with changes in activity within audiovisual integration sites. Changes in activity within the superior temporal sulcus/superior temporal gyrus may be related to the behavioral increases in intelligibility brought about by this augmented communication method.

Download Full-text

The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech

eLife ◽

10.7554/elife.48116 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 8

Author(s):

Patrick J Karas ◽

John F Magnotti ◽

Brian A Metzger ◽

Lin L Zhu ◽

Kristen B Smith ◽

...

Keyword(s):

Head Start ◽

Visual Information ◽

Temporal Cortex ◽

Superior Temporal Gyrus ◽

Visual Speech ◽

Association Cortex ◽

Auditory Information ◽

Neural Responses ◽

Auditory Speech ◽

Auditory Association Cortex

Visual information about speech content from the talker’s mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.

Download Full-text