Rapid recalibration of speech perception after experiencing the McGurk illusion

Claudia S. Lüttke; Alexis Pérez-Bellido; Floris P. de Lange

doi:10.1098/rsos.170909

Rapid recalibration of speech perception after experiencing the McGurk illusion

Royal Society Open Science ◽

10.1098/rsos.170909 ◽

2018 ◽

Vol 5 (3) ◽

pp. 170909 ◽

Cited By ~ 7

Author(s):

Claudia S. Lüttke ◽

Alexis Pérez-Bellido ◽

Floris P. de Lange

Keyword(s):

Speech Perception ◽

Theoretical Analysis ◽

Visual Information ◽

Speech Sound ◽

Visual Speech ◽

Perceptual Integration ◽

Visual Inputs ◽

Perceptual Shift ◽

Mcgurk Illusion ◽

Phoneme Categorization

The human brain can quickly adapt to changes in the environment. One example is phonetic recalibration: a speech sound is interpreted differently depending on the visual speech and this interpretation persists in the absence of visual information. Here, we examined the mechanisms of phonetic recalibration. Participants categorized the auditory syllables /aba/ and /ada/, which were sometimes preceded by the so-called McGurk stimuli (in which an /aba/ sound, due to visual /aga/ input, is often perceived as ‘ada’). We found that only one trial of exposure to the McGurk illusion was sufficient to induce a recalibration effect, i.e. an auditory /aba/ stimulus was subsequently more often perceived as ‘ada’. Furthermore, phonetic recalibration took place only when auditory and visual inputs were integrated to ‘ada’ (McGurk illusion). Moreover, this recalibration depended on the sensory similarity between the preceding and current auditory stimulus. Finally, signal detection theoretical analysis showed that McGurk-induced phonetic recalibration resulted in both a criterion shift towards /ada/ and a reduced sensitivity to distinguish between /aba/ and /ada/ sounds. The current study shows that phonetic recalibration is dependent on the perceptual integration of audiovisual information and leads to a perceptual shift in phoneme categorization.

Download Full-text

Enhanced Speechreading Performance in Young Hearing Aid Users in China

Journal of Speech Language and Hearing Research ◽

10.1044/2018_jslhr-s-18-0153 ◽

2019 ◽

Vol 62 (2) ◽

pp. 307-317 ◽

Cited By ~ 1

Author(s):

Jianghua Lei ◽

Huina Gong ◽

Liang Chen

Keyword(s):

Young Adults ◽

Speech Perception ◽

Hearing Impairment ◽

Hearing Aids ◽

Visual Information ◽

Hearing Aid ◽

Visual Speech ◽

Positive Correlation ◽

Visual Speech Perception

Purpose The study was designed primarily to determine if the use of hearing aids (HAs) in individuals with hearing impairment in China would affect their speechreading performance. Method Sixty-seven young adults with hearing impairment with HAs and 78 young adults with hearing impairment without HAs completed newly developed Chinese speechreading tests targeting 3 linguistic levels (i.e., words, phrases, and sentences). Results Groups with HAs were more accurate at speechreading than groups without HA across the 3 linguistic levels. For both groups, speechreading accuracy was higher for phrases than words and sentences, and speechreading speed was slower for sentences than words and phrases. Furthermore, there was a positive correlation between years of HA use and the accuracy of speechreading performance; longer HA use was associated with more accurate speechreading. Conclusions Young HA users in China have enhanced speechreading performance over their peers with hearing impairment who are not HA users. This result argues against the perceptual dependence hypothesis that suggests greater dependence on visual information leads to improvement in visual speech perception.

Download Full-text

Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex

10.1101/2020.09.07.284455 ◽

2020 ◽

Cited By ~ 1

Author(s):

Karthik Ganesan ◽

John Plass ◽

Adriene M. Beltz ◽

Zhongming Liu ◽

Marcia Grabowecky ◽

...

Keyword(s):

Speech Perception ◽

Auditory Cortex ◽

Auditory Processing ◽

Visual Information ◽

Visual Speech ◽

Visual Signals ◽

Audiovisual Speech ◽

Frequency Bands ◽

Beta Power ◽

High Gamma

AbstractSpeech perception is a central component of social communication. While speech perception is primarily driven by sounds, accurate perception in everyday settings is also supported by meaningful information extracted from visual cues (e.g., speech content, timing, and speaker identity). Previous research has shown that visual speech modulates activity in cortical areas subserving auditory speech perception, including the superior temporal gyrus (STG), likely through feedback connections from the multisensory posterior superior temporal sulcus (pSTS). However, it is unknown whether visual modulation of auditory processing in the STG is a unitary phenomenon or, rather, consists of multiple temporally, spatially, or functionally discrete processes. To explore these questions, we examined neural responses to audiovisual speech in electrodes implanted intracranially in the temporal cortex of 21 patients undergoing clinical monitoring for epilepsy. We found that visual speech modulates auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differ across theta, beta, and high-gamma frequency bands. Before speech onset, visual information increased high-gamma power in the posterior STG and suppressed beta power in mid-STG regions, suggesting crossmodal prediction of speech signals in these areas. After sound onset, visual speech decreased theta power in the middle and posterior STG, potentially reflecting a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.Significance StatementVisual speech cues are often needed to disambiguate distorted speech sounds in the natural environment. However, understanding how the brain encodes and transmits visual information for usage by the auditory system remains a challenge. One persistent question is whether visual signals have a unitary effect on auditory processing or elicit multiple distinct effects throughout auditory cortex. To better understand how vision modulates speech processing, we measured neural activity produced by audiovisual speech from electrodes surgically implanted in auditory areas of 21 patients with epilepsy. Group-level statistics using linear mixed-effects models demonstrated distinct patterns of activity across different locations, timepoints, and frequency bands, suggesting the presence of multiple audiovisual mechanisms supporting speech perception processes in auditory cortex.

Download Full-text

Investigating the Effects of Hearing Loss and Hearing Aid Digital Delay on Sound-Induced Flash Illusion

Journal of Audiology & Otology ◽

10.7874/jao.2019.00507 ◽

2020 ◽

Vol 24 (4) ◽

pp. 174-179 ◽

Cited By ~ 1

Author(s):

Vahid Moradi ◽

Kiana Kheirkhah ◽

Saeid Farahani ◽

Iman Kavianpour

Keyword(s):

Hearing Loss ◽

Visual Information ◽

Transmission Rate ◽

Hearing Aid ◽

Digital Signal ◽

Visual Speech ◽

Fully Integrated ◽

Visual Inputs ◽

System Input ◽

Speech Information

Background and Objectives: The integration of auditory-visual speech information improves speech perception; however, if the auditory system input is disrupted due to hearing loss, auditory and visual inputs cannot be fully integrated. Additionally, temporal coincidence of auditory and visual input is a significantly important factor in integrating the input of these two senses. Time delayed acoustic pathway caused by the signal passing through digital signal processing. Therefore, this study aimed to investigate the effects of hearing loss and hearing aid digital delay circuit on sound-induced flash illusion. Subjects and Methods: A total of 13 adults with normal hearing, 13 with mild to moderate hearing loss, and 13 with moderate to severe hearing loss were enrolled in this study. Subsequently, the sound-induced flash illusion test was conducted, and the results were analyzed. Results: The results showed that hearing aid digital delay and hearing loss had no detrimental effect on sound-induced flash illusion.Conclusions: Transmission velocity and neural transduction rate of the auditory inputs decreased in patients with hearing loss. Hence, the integrating auditory and visual sensory cannot be combined completely. Although the transmission rate of the auditory sense input was approximately normal when the hearing aid was prescribed. Thus, it can be concluded that the processing delay in the hearing aid circuit is insufficient to disrupt the integration of auditory and visual information.

Download Full-text

Cross-modal Suppression of Auditory Association Cortex by Visual Speech as a Mechanism for Audiovisual Speech Perception

10.1101/626259 ◽

2019 ◽

Author(s):

Patrick J. Karas ◽

John F. Magnotti ◽

Brian A. Metzger ◽

Lin L. Zhu ◽

Kristen B. Smith ◽

...

Keyword(s):

Speech Perception ◽

Auditory Processing ◽

Visual Information ◽

Visual Speech ◽

Association Cortex ◽

Auditory Information ◽

Voice Leading ◽

Audiovisual Speech Perception ◽

Auditory Association Cortex ◽

The Voice

AbstractVision provides a perceptual head start for speech perception because most speech is “mouth-leading”: visual information from the talker’s mouth is available before auditory information from the voice. However, some speech is “voice-leading” (auditory before visual). Consistent with a model in which vision modulates subsequent auditory processing, there was a larger perceptual benefit of visual speech for mouth-leading vs. voice-leading words (28% vs. 4%). The neural substrates of this difference were examined by recording broadband high-frequency activity from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. Responses were smaller for audiovisual vs. auditory-only mouth-leading words (34% difference) while there was little difference (5%) for voice-leading words. Evidence for cross-modal suppression of auditory cortex complements our previous work showing enhancement of visual cortex (Ozker et al., 2018b) and confirms that multisensory interactions are a powerful modulator of activity throughout the speech perception network.Impact StatementHuman perception and brain responses differ between words in which mouth movements are visible before the voice is heard and words for which the reverse is true.

Download Full-text

Attention to Facial Regions in Segmental and Prosodic Visual Speech Perception Tasks

Journal of Speech Language and Hearing Research ◽

10.1044/jslhr.4203.526 ◽

1999 ◽

Vol 42 (3) ◽

pp. 526-539 ◽

Cited By ~ 56

Author(s):

Charissa R. Lansing ◽

George W. McConkie

Keyword(s):

Speech Perception ◽

Visual Information ◽

Eye Gaze ◽

Normal Hearing ◽

Visual Speech ◽

Gaze Direction ◽

The Gaze ◽

The Face ◽

Visual Speech Perception ◽

Prosodic Categories

Two experiments were conducted to test the hypothesis that visual information related to segmental versus prosodic aspects of speech is distributed differently on the face of the talker. In the first experiment, eye gaze was monitored for 12 observers with normal hearing. Participants made decisions about segmental and prosodic categories for utterances presented without sound. The first experiment found that observers spend more time looking at and direct more gazes toward the upper part of the talker's face in making decisions about intonation patterns than about the words being spoken. The second experiment tested the Gaze Direction Assumption underlying Experiment 1—that is, that people direct their gaze to the stimulus region containing information required for their task. In this experiment, 18 observers with normal hearing made decisions about segmental and prosodic categories under conditions in which face motion was restricted to selected areas of the face. The results indicate that information in the upper part of the talker's face is more critical for intonation pattern decisions than for decisions about word segments or primary sentence stress, thus supporting the Gaze Direction Assumption. Visual speech perception proficiency requires learning where to direct visual attention for cues related to different aspects of speech.

Download Full-text

Time course of audio–visual phoneme identification: A cross-modal gating study

Seeing and Perceiving ◽

10.1163/187847612x648233 ◽

2012 ◽

Vol 25 (0) ◽

pp. 194

Author(s):

Carolina Sánchez-García ◽

Sonia Kandel ◽

Christophe Savariaux ◽

Nara Ikumi ◽

Salvador Soto-Faraco

Keyword(s):

Speech Perception ◽

Visual Information ◽

High Speed ◽

Time Course ◽

Visual Saliency ◽

Past Research ◽

Visual Speech ◽

Visual Performance ◽

Auditory Information ◽

Temporal Course

When both present, visual and auditory information are combined in order to decode the speech signal. Past research has addressed to what extent visual information contributes to distinguish confusable speech sounds, but usually ignoring the continuous nature of speech perception. Here we tap at the temporal course of the contribution of visual and auditory information during the process of speech perception. To this end, we designed an audio–visual gating task with videos recorded with high speed camera. Participants were asked to identify gradually longer fragments of pseudowords varying in the central consonant. Different Spanish consonant phonemes with different degree of visual and acoustic saliency were included, and tested on visual-only, auditory-only and audio–visual trials. The data showed different patterns of contribution of unimodal and bimodal information during identification, depending on the visual saliency of the presented phonemes. In particular, for phonemes which are clearly more salient in one modality than the other, audio–visual performance equals that of the best unimodal. In phonemes with more balanced saliency, audio–visual performance was better than both unimodal conditions. These results shed new light on the temporal course of audio–visual speech integration.

Download Full-text

Does it help to see the speaker’s lip movements?

Translation Cognition & Behavior ◽

10.1075/tcb.00049.gie ◽

2021 ◽

Author(s):

Anne Catherine Gieshoff

Keyword(s):

Speech Perception ◽

Cognitive Load ◽

Visual Information ◽

Mental Effort ◽

Simultaneous Interpreting ◽

Visual Inputs ◽

Accuracy Measures ◽

The One ◽

Subjective Reports

Abstract Simultaneous interpreting combines auditory and visual information. Within a multitude of visual inputs that interpreters receive, the one from the speaker seems to be particularly important (Bühler 1985; Seubert 2019). One reason might be that lip movements enhance speech perception and might thus reduce cognitive load in simultaneous interpreting and hence, induce lower mental effort. This effect may be even more pronounced when noise is added to the source speech. This study was conducted to investigate cognitive load and mental effort during simultaneous interpreting (a) with and without the ability to see speaker’s lip movements, and (b) with and without interfering noise. A group of listeners was included to control for task-related effects. Mental effort and cognitive load were measured using pupillometry, interpreting accuracy measures, and subjective reports. The facilitation hypothesis for lip movements was not confirmed. However, the pupillometric data suggests that lip movements may increase arousal.

Download Full-text

When eyes beat lips: speaker gaze affects audiovisual integration in the McGurk illusion

Psychological Research ◽

10.1007/s00426-021-01618-y ◽

2021 ◽

Author(s):

Basil Wahn ◽

Laura Schmitz ◽

Alan Kingstone ◽

Anne Böckler-Raettig

Keyword(s):

Visual Information ◽

Critical Role ◽

Audiovisual Integration ◽

Visual Speech ◽

Speech Signals ◽

Human Communication ◽

Direct Gaze ◽

Eye Motion ◽

Mcgurk Illusion ◽

Illusory Percept

AbstractEye contact is a dynamic social signal that captures attention and plays a critical role in human communication. In particular, direct gaze often accompanies communicative acts in an ostensive function: a speaker directs her gaze towards the addressee to highlight the fact that this message is being intentionally communicated to her. The addressee, in turn, integrates the speaker’s auditory and visual speech signals (i.e., her vocal sounds and lip movements) into a unitary percept. It is an open question whether the speaker’s gaze affects how the addressee integrates the speaker’s multisensory speech signals. We investigated this question using the classic McGurk illusion, an illusory percept created by presenting mismatching auditory (vocal sounds) and visual information (speaker’s lip movements). Specifically, we manipulated whether the speaker (a) moved his eyelids up/down (i.e., open/closed his eyes) prior to speaking or did not show any eye motion, and (b) spoke with open or closed eyes. When the speaker’s eyes moved (i.e., opened or closed) before an utterance, and when the speaker spoke with closed eyes, the McGurk illusion was weakened (i.e., addressees reported significantly fewer illusory percepts). In line with previous research, this suggests that motion (opening or closing), as well as the closed state of the speaker’s eyes, captured addressees’ attention, thereby reducing the influence of the speaker’s lip movements on the addressees’ audiovisual integration process. Our findings reaffirm the power of speaker gaze to guide attention, showing that its dynamics can modulate low-level processes such as the integration of multisensory speech signals.

Download Full-text

Gaze Behaviour in Audiovisual Speech Perception: Asymmetrical Distribution of Face-Directed Fixations

Perception ◽

10.1068/p5852 ◽

2007 ◽

Vol 36 (10) ◽

pp. 1535-1545 ◽

Cited By ~ 24

Author(s):

Ian T Everdell ◽

Heidi Marsh ◽

Micheal D Yurick ◽

Kevin G Munhall ◽

Martin Paré

Keyword(s):

Speech Perception ◽

Visual Information ◽

Speech Intelligibility ◽

Visual Speech ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Gaze Fixation ◽

Dynamic Faces ◽

Gaze Behaviour ◽

Speech Information

Speech perception under natural conditions entails integration of auditory and visual information. Understanding how visual and auditory speech information are integrated requires detailed descriptions of the nature and processing of visual speech information. To understand better the process of gathering visual information, we studied the distribution of face-directed fixations of humans performing an audiovisual speech perception task to characterise the degree of asymmetrical viewing and its relationship to speech intelligibility. Participants showed stronger gaze fixation asymmetries while viewing dynamic faces, compared to static faces or face-like objects, especially when gaze was directed to the talkers' eyes. Although speech perception accuracy was significantly enhanced by the viewing of congruent, dynamic faces, we found no correlation between task performance and gaze fixation asymmetry. Most participants preferentially fixated the right side of the faces and their preferences persisted while viewing horizontally mirrored stimuli, different talkers, or static faces. These results suggest that the asymmetrical distributions of gaze fixations reflect the participants' viewing preferences, rather than being a product of asymmetrical faces, but that this behavioural bias does not predict correct audiovisual speech perception.

Download Full-text

Maladaptive connectivity of broca’s area in schizophrenia during audiovisual speech perception: an FMRI study

European Psychiatry ◽

10.1016/s0924-9338(11)73216-3 ◽

2011 ◽

Vol 26 (S2) ◽

pp. 1512-1512

Author(s):

G.R. Szycik ◽

Z. Ye ◽

B. Mohammadi ◽

W. Dillo ◽

B.T. te Wildt ◽

...

Keyword(s):

Speech Perception ◽

Functional Connectivity ◽

Visual Information ◽

Audiovisual Integration ◽

Visual Speech ◽

Control Group ◽

Broca’S Area ◽

Broca's Area ◽

Brain Areas ◽

Audiovisual Speech Perception

IntroductionNatural speech perception relies on both, auditory and visual information. Both sensory channels provide redundant and complementary information, such that speech perception is enhanced in healthy subjects, when both information channels are present.ObjectivesPatients with schizophrenia have been reported to have problems regarding this audiovisual integration process, but little is known about which neural processes are altered.AimsIn this study we investigated functional connectivity of Broca’s area in patients with schizophrenia.MethodsFunctional magnetic resonance imaging (fMRI) was performed in 15 schizophrenia patients and 15 healthy controls to study functional connectivity of Broca’s area during perception of videos of bisyllabic German nouns, in which audio and video either matched (congruent condition) or die not match (incongruent; e.g. video = hotel, audio = island).ResultsThere were differences in connectivity between experimental groups and between conditions. Broca’s area of the patient group showed connections to more brain areas than the control group. This difference was more prominent in the incongruent condition, for which only one connection between Broca's area and the supplementary motor area was found in control participants, whereas patients showed connections to 8 widely distributed brain areas.ConclusionsThe findings imply that audiovisual integration problems in schizophrenia result from maladaptive connectivity of Broca's area in particular when confronted with incongruent stimuli and are discussed in light of recent audio visual speech models.

Download Full-text