scholarly journals Phase resetting in human auditory cortex to visual speech

2018 ◽  
Author(s):  
Pierre Mégevand ◽  
Manuel R. Mercier ◽  
David M. Groppe ◽  
Elana Zion Golumbic ◽  
Nima Mesgarani ◽  
...  

ABSTRACTNatural conversation is multisensory: when we can see the speaker’s face, visual speech cues influence our perception of what is being said. The neuronal basis of this phenomenon remains unclear, though there is indication that phase modulation of neuronal oscillations—ongoing excitability fluctuations of neuronal populations in the brain—provides a mechanistic contribution. Investigating this question using naturalistic audiovisual speech with intracranial recordings in humans, we show that neuronal populations in auditory cortex track the temporal dynamics of unisensory visual speech using the phase of their slow oscillations and phase-related modulations in high-frequency activity. Auditory cortex thus builds a representation of the speech stream’s envelope based on visual speech alone, at least in part by resetting the phase of its ongoing oscillations. Phase reset could amplify the representation of the speech stream and organize the information contained in neuronal activity patterns.SIGNIFICANCE STATEMENTWatching the speaker can facilitate our understanding of what is being said. The mechanisms responsible for this influence of visual cues on the processing of speech remain incompletely understood. We studied those mechanisms by recording the human brain’s electrical activity through electrodes implanted surgically inside the skull. We found that some regions of cerebral cortex that process auditory speech also respond to visual speech even when it is shown as a silent movie without a soundtrack. This response can occur through a reset of the phase of ongoing oscillations, which helps augment the response of auditory cortex to audiovisual speech. Our results contribute to discover the mechanisms by which the brain merges auditory and visual speech into a unitary perception.

1997 ◽  
Vol 40 (2) ◽  
pp. 432-443 ◽  
Author(s):  
Karen S. Helfer

Research has shown that speaking in a deliberately clear manner can improve the accuracy of auditory speech recognition. Allowing listeners access to visual speech cues also enhances speech understanding. Whether the nature of information provided by speaking clearly and by using visual speech cues is redundant has not been determined. This study examined how speaking mode (clear vs. conversational) and presentation mode (auditory vs. auditory-visual) influenced the perception of words within nonsense sentences. In Experiment 1, 30 young listeners with normal hearing responded to videotaped stimuli presented audiovisually in the presence of background noise at one of three signal-to-noise ratios. In Experiment 2, 9 participants returned for an additional assessment using auditory-only presentation. Results of these experiments showed significant effects of speaking mode (clear speech was easier to understand than was conversational speech) and presentation mode (auditoryvisual presentation led to better performance than did auditory-only presentation). The benefit of clear speech was greater for words occurring in the middle of sentences than for words at either the beginning or end of sentences for both auditory-only and auditory-visual presentation, whereas the greatest benefit from supplying visual cues was for words at the end of sentences spoken both clearly and conversationally. The total benefit from speaking clearly and supplying visual cues was equal to the sum of each of these effects. Overall, the results suggest that speaking clearly and providing visual speech information provide complementary (rather than redundant) information.


2017 ◽  
Vol 114 (48) ◽  
pp. 12827-12832 ◽  
Author(s):  
Diego Vidaurre ◽  
Stephen M. Smith ◽  
Mark W. Woolrich

The brain recruits neuronal populations in a temporally coordinated manner in task and at rest. However, the extent to which large-scale networks exhibit their own organized temporal dynamics is unclear. We use an approach designed to find repeating network patterns in whole-brain resting fMRI data, where networks are defined as graphs of interacting brain areas. We find that the transitions between networks are nonrandom, with certain networks more likely to occur after others. Further, this nonrandom sequencing is itself hierarchically organized, revealing two distinct sets of networks, or metastates, that the brain has a tendency to cycle within. One metastate is associated with sensory and motor regions, and the other involves areas related to higher order cognition. Moreover, we find that the proportion of time that a subject spends in each brain network and metastate is a consistent subject-specific measure, is heritable, and shows a significant relationship with cognitive traits.


2016 ◽  
Author(s):  
Liberty S. Hamilton ◽  
Erik Edwards ◽  
Edward F. Chang

AbstractTo derive meaning from speech, we must extract multiple dimensions of concurrent information from incoming speech signals, including phonetic and prosodic cues. Equally important is the detection of acoustic cues that give structure and context to the information we hear, such as sentence boundaries. How the brain organizes this information processing is unknown. Here, using data-driven computational methods on an extensive set of high-density intracranial recordings, we reveal a large-scale partitioning of the entire human speech cortex into two spatially distinct regions that detect important cues for parsing natural speech. These caudal (Zone 1) and rostral (Zone 2) regions work in parallel to detect onsets and prosodic information, respectively, within naturally spoken sentences. In contrast, local processing within each region supports phonetic feature encoding. These findings demonstrate a fundamental organizational property of the human auditory cortex that has been previously unrecognized.


2017 ◽  
Author(s):  
Jeremy I Skipper ◽  
Jason D Zevin

How is speech understood despite the lack of a deterministic relationship between the sounds reaching auditory cortex and what we perceive? One possibility is that unheard words that are unconsciously activated in association with listening context are used to constrain interpretation. We hypothesized that a mechanism for doing so involves reusing the ability of the brain to predict the sensory effects of speaking associated words. Predictions are then compared to signals arriving in auditory cortex, resulting in reduced processing demands when accurate. Indeed, we show that sensorimotor brain regions are more active prior to words predictable from listening context. This activity resembles lexical and speech production related processes and, specifically, subsequent but still unpresented words. When those words occur, auditory cortex activity is reduced, through feedback connectivity. In less predictive contexts, activity patterns and connectivity for the same words are markedly different. Results suggest that the brain reorganizes to actively use knowledge about context to construct the speech we hear, enabling rapid and accurate comprehension despite acoustic variability.


Author(s):  
Karthik Ganesan ◽  
John Plass ◽  
Adriene M. Beltz ◽  
Zhongming Liu ◽  
Marcia Grabowecky ◽  
...  

AbstractSpeech perception is a central component of social communication. While speech perception is primarily driven by sounds, accurate perception in everyday settings is also supported by meaningful information extracted from visual cues (e.g., speech content, timing, and speaker identity). Previous research has shown that visual speech modulates activity in cortical areas subserving auditory speech perception, including the superior temporal gyrus (STG), likely through feedback connections from the multisensory posterior superior temporal sulcus (pSTS). However, it is unknown whether visual modulation of auditory processing in the STG is a unitary phenomenon or, rather, consists of multiple temporally, spatially, or functionally discrete processes. To explore these questions, we examined neural responses to audiovisual speech in electrodes implanted intracranially in the temporal cortex of 21 patients undergoing clinical monitoring for epilepsy. We found that visual speech modulates auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differ across theta, beta, and high-gamma frequency bands. Before speech onset, visual information increased high-gamma power in the posterior STG and suppressed beta power in mid-STG regions, suggesting crossmodal prediction of speech signals in these areas. After sound onset, visual speech decreased theta power in the middle and posterior STG, potentially reflecting a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.Significance StatementVisual speech cues are often needed to disambiguate distorted speech sounds in the natural environment. However, understanding how the brain encodes and transmits visual information for usage by the auditory system remains a challenge. One persistent question is whether visual signals have a unitary effect on auditory processing or elicit multiple distinct effects throughout auditory cortex. To better understand how vision modulates speech processing, we measured neural activity produced by audiovisual speech from electrodes surgically implanted in auditory areas of 21 patients with epilepsy. Group-level statistics using linear mixed-effects models demonstrated distinct patterns of activity across different locations, timepoints, and frequency bands, suggesting the presence of multiple audiovisual mechanisms supporting speech perception processes in auditory cortex.


2019 ◽  
Vol 62 (10) ◽  
pp. 3860-3875 ◽  
Author(s):  
Kaylah Lalonde ◽  
Lynne A. Werner

Purpose This study assessed the extent to which 6- to 8.5-month-old infants and 18- to 30-year-old adults detect and discriminate auditory syllables in noise better in the presence of visual speech than in auditory-only conditions. In addition, we examined whether visual cues to the onset and offset of the auditory signal account for this benefit. Method Sixty infants and 24 adults were randomly assigned to speech detection or discrimination tasks and were tested using a modified observer-based psychoacoustic procedure. Each participant completed 1–3 conditions: auditory-only, with visual speech, and with a visual signal that only cued the onset and offset of the auditory syllable. Results Mixed linear modeling indicated that infants and adults benefited from visual speech on both tasks. Adults relied on the onset–offset cue for detection, but the same cue did not improve their discrimination. The onset–offset cue benefited infants for both detection and discrimination. Whereas the onset–offset cue improved detection similarly for infants and adults, the full visual speech signal benefited infants to a lesser extent than adults on the discrimination task. Conclusions These results suggest that infants' use of visual onset–offset cues is mature, but their ability to use more complex visual speech cues is still developing. Additional research is needed to explore differences in audiovisual enhancement (a) of speech discrimination across speech targets and (b) with increasingly complex tasks and stimuli.


2020 ◽  
Author(s):  
Aisling E. O’Sullivan ◽  
Michael J. Crosse ◽  
Giovanni M. Di Liberto ◽  
Alain de Cheveigné ◽  
Edmund C. Lalor

AbstractSeeing a speaker’s face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker’s face provides temporal cues to auditory cortex, and articulatory information from the speaker’s mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However it remains unclear how the integration of these cues varies as a function of listening conditions. Here we sought to provide insight on these questions by examining EEG responses to natural audiovisual, audio, and visual speech in quiet and in noise. Specifically, we represented our speech stimuli in terms of their spectrograms and their phonetic features, and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis. The encoding of both spectrotemporal and phonetic features was shown to be more robust in audiovisual speech responses then what would have been expected from the summation of the audio and visual speech responses, consistent with the literature on multisensory integration. Furthermore, the strength of this multisensory enhancement was more pronounced at the level of phonetic processing for speech in noise relative to speech in quiet, indicating that listeners rely more on articulatory details from visual speech in challenging listening conditions. These findings support the notion that the integration of audio and visual speech is a flexible, multistage process that adapts to optimize comprehension based on the current listening conditions.Significance StatementDuring conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here we examine audiovisual integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how audiovisual integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions, and when the speech is noisy, we find enhanced integration at the phonetic stage of processing. These findings provide support for the multistage integration framework and demonstrate its flexibility in terms of a greater reliance on visual articulatory information in challenging listening conditions.


2020 ◽  
Vol 41 (4) ◽  
pp. 933-961
Author(s):  
Rebecca Holt ◽  
Laurence Bruggeman ◽  
Katherine Demuth

AbstractProcessing speech can be slow and effortful for children, especially in adverse listening conditions, such as the classroom. This can have detrimental effects on children’s academic achievement. We therefore asked whether primary school children’s speech processing could be made faster and less effortful via the presentation of visual speech cues (speaker’s facial movements), and whether any audio-visual benefit would be modulated by the presence of noise or by characteristics of individual participants. A phoneme monitoring task with concurrent pupillometry was used to measure 7- to 11-year-old children’s speech processing speed and effort, with and without visual cues, in both quiet and noise. Results demonstrated that visual cues to speech can facilitate children’s speech processing, but that these benefits may also be subject to variability according to children’s motivation. Children showed faster processing and reduced effort when visual cues were available, regardless of listening condition. However, examination of individual variability revealed that the reduction in effort was driven by the children who performed better on a measure of phoneme isolation (used to quantify how difficult they found the phoneme monitoring task).


2008 ◽  
Vol 17 (6) ◽  
pp. 405-409 ◽  
Author(s):  
Lawrence D. Rosenblum

Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal speech information could explain the reported automaticity, immediacy, and completeness of audiovisual speech integration. However, recent findings suggest that speech integration can be influenced by higher cognitive properties such as lexical status and semantic context. Proponents of amodal accounts will need to explain these results.


Sign in / Sign up

Export Citation Format

Share Document