Use of visual information in speech perception: Evidence for a visual rate effect both with and without a McGurk effect

A problematic issue that frequently arises in the examination of video and audio recordings, namely the question of visual and auditory perception of oral speech – the establishment of the content of a conversation based on its image (lip reading) – is considered. The article purpose is to analyze the possibility and feasibility of examining the visual-auditory perception of oral speech in the framework of the examination of video and sound recordings, considering the peculiarities of such research; the ability to use visual information either as an independent object of examination (lip reading), or as a supplementary, additional to auditory analysis of a particular message. The main components of the process of lip reading, the possibility of visual examination of visual and auditory information in order to establish the content of a conversation are considered. Attention is paid to the features of visual and auditory perception of oral speech, and the factors that contribute enormously to the informative nature of the overall picture of oral speech perception by an image are analyzed. The influence of the visual image on the speech perception by an image is considered, such as active articulation, facial expressions, head movement, position of teeth, gestures, etc. In addition to the quality of the image, the duration of the speech fragment also affects the perception of oral speech by the image: a fully uttered expression is usually read better than its individual parts. The article also draws attention to the ambiguity of articulatory images of sounds. The features of the McGurk effect – a perception phenomenon that demonstrates the interaction between hearing and vision while the perception of speech – are considered. The analysis of the possibility and feasibility of examining visual and auditory perception of oral speech within the framework of the examination of video and sound recordings is carried out, and the peculiarities of such research are highlighted.

Download Full-text

Audiovisual Speech Perception

Perception ◽

10.1068/v970029 ◽

1997 ◽

Vol 26 (1_suppl) ◽

pp. 347-347

Author(s):

M Sams

Keyword(s):

Speech Perception ◽

Visual Information ◽

Word Meaning ◽

Source Area ◽

Mcgurk Effect ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Processing Level ◽

Early Processing ◽

Audiovisual Fusion

Persons with hearing loss use visual information from articulation to improve their speech perception. Even persons with normal hearing utilise visual information, especially when the stimulus-to-noise ratio is poor. A dramatic demonstration of the role of vision in speech perception is the audiovisual fusion called the ‘McGurk effect’. When the auditory syllable /pa/ is presented in synchrony with the face articulating the syllable /ka/, the subject usually perceives /ta/ or /ka/. The illusory perception is clearly auditory in nature. We recently studied the audiovisual fusion (acoustical /p/, visual /k/) for Finnish (1) syllables, and (2) words. Only 3% of the subjects perceived the syllables according to the acoustical input, ie in 97% of the subjects the perception was influenced by the visual information. For words the percentage of acoustical identifications was 10%. The results demonstrate a very strong influence of visual information of articulation in face-to-face speech perception. Word meaning and sentence context have a negligible influence on the fusion. We have also recorded neuromagnetic responses of the human cortex when the subjects both heard and saw speech. Some subjects showed a distinct response to a ‘McGurk’ stimulus. The response was rather late, emerging about 200 ms from the onset of the auditory stimulus. We suggest that the perisylvian cortex, close to the source area for the auditory 100 ms response (M100), may be activated by the discordant stimuli. The behavioural and neuromagnetic results suggest a precognitive audiovisual speech integration occurring at a relatively early processing level.

Download Full-text

Neural Correlates of Modality-Sensitive Deviance Detection in the Audiovisual Oddball Paradigm

Brain Sciences ◽

10.3390/brainsci10060328 ◽

2020 ◽

Vol 10 (6) ◽

pp. 328

Author(s):

Melissa Randazzo ◽

Ryan Priefer ◽

Paul J. Smith ◽

Amanda Nagler ◽

Trey Avery ◽

...

Keyword(s):

Speech Perception ◽

Visual Information ◽

Time Window ◽

Mcgurk Effect ◽

Predictive Processing ◽

Visual Signal ◽

Oddball Paradigm ◽

Audiovisual Speech ◽

Late Time ◽

Audiovisual Speech Perception

The McGurk effect, an incongruent pairing of visual /ga/–acoustic /ba/, creates a fusion illusion /da/ and is the cornerstone of research in audiovisual speech perception. Combination illusions occur given reversal of the input modalities—auditory /ga/-visual /ba/, and percept /bga/. A robust literature shows that fusion illusions in an oddball paradigm evoke a mismatch negativity (MMN) in the auditory cortex, in absence of changes to acoustic stimuli. We compared fusion and combination illusions in a passive oddball paradigm to further examine the influence of visual and auditory aspects of incongruent speech stimuli on the audiovisual MMN. Participants viewed videos under two audiovisual illusion conditions: fusion with visual aspect of the stimulus changing, and combination with auditory aspect of the stimulus changing, as well as two unimodal auditory- and visual-only conditions. Fusion and combination deviants exerted similar influence in generating congruency predictions with significant differences between standards and deviants in the N100 time window. Presence of the MMN in early and late time windows differentiated fusion from combination deviants. When the visual signal changes, a new percept is created, but when the visual is held constant and the auditory changes, the response is suppressed, evoking a later MMN. In alignment with models of predictive processing in audiovisual speech perception, we interpreted our results to indicate that visual information can both predict and suppress auditory speech perception.

Download Full-text

A value-driven McGurk effect: Value-associated faces enhance the influence of visual information on audiovisual speech perception and its eye movement pattern

Attention Perception & Psychophysics ◽

10.3758/s13414-019-01918-x ◽

2020 ◽

Vol 82 (4) ◽

pp. 1928-1941

Author(s):

Xiaoxiao Luo ◽

Guanlan Kang ◽

Yu Guo ◽

Xingcheng Yu ◽

Xiaolin Zhou

Keyword(s):

Speech Perception ◽

Eye Movement ◽

Visual Information ◽

Movement Pattern ◽

Mcgurk Effect ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

A Value

Download Full-text

Audiovisual speech perception: Moving beyond McGurk

10.31234/osf.io/6y8qw ◽

2019 ◽

Author(s):

Kristin J. Van Engen ◽

Avanti Dey ◽

Mitchell Sommers ◽

Jonathan E. Peelle

Keyword(s):

Speech Perception ◽

Visual Information ◽

Visual Cues ◽

Real Life ◽

Audiovisual Integration ◽

Mcgurk Effect ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Combined Use ◽

Everyday Communication

Although listeners use both auditory and visual cues during speech perception, the cognitive and neural bases for their integration remain a matter of debate. One common approach to measuring multisensory integration is to use McGurk tasks, in which discrepant auditory and visual cues produce auditory percepts that differ from those based solely on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences in susceptibility are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we argue that McGurk tasks are ill-suited for studying the kind of multisensory speech perception that occurs in real life: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility on McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication.

Download Full-text

Weaker McGurk Effect for Rubin’s Vase-Type Speech in People With High Autistic Traits

Multisensory Research ◽

10.1163/22134808-bja10047 ◽

2021 ◽

pp. 1-17

Author(s):

Yuta Ujiie ◽

Kohske Takahashi

Keyword(s):

Speech Recognition ◽

Speech Perception ◽

Visual Information ◽

Autistic Traits ◽

Autism Spectrum ◽

Mcgurk Effect ◽

Visual Speech ◽

Audiovisual Speech ◽

Autism Spectrum Quotient ◽

Audiovisual Speech Perception

Abstract While visual information from facial speech modulates auditory speech perception, it is less influential on audiovisual speech perception among autistic individuals than among typically developed individuals. In this study, we investigated the relationship between autistic traits (Autism-Spectrum Quotient; AQ) and the influence of visual speech on the recognition of Rubin’s vase-type speech stimuli with degraded facial speech information. Participants were 31 university students (13 males and 18 females; mean age: 19.2, SD: 1.13 years) who reported normal (or corrected-to-normal) hearing and vision. All participants completed three speech recognition tasks (visual, auditory, and audiovisual stimuli) and the AQ–Japanese version. The results showed that accuracies of speech recognition for visual (i.e., lip-reading) and auditory stimuli were not significantly related to participants’ AQ. In contrast, audiovisual speech perception was less susceptible to facial speech perception among individuals with high rather than low autistic traits. The weaker influence of visual information on audiovisual speech perception in autism spectrum disorder (ASD) was robust regardless of the clarity of the visual information, suggesting a difficulty in the process of audiovisual integration rather than in the visual processing of facial speech.

Download Full-text

Learning to Integrate Auditory and Visual Information in Speech Perception

PsycEXTRA Dataset ◽

10.1037/e537052012-627 ◽

2004 ◽

Author(s):

Joseph D. W. Stephens ◽

Lori L. Holt

Keyword(s):

Speech Perception ◽

Visual Information

Download Full-text

An invisible speaker can facilitate auditory speech perception

Seeing and Perceiving ◽

10.1163/187847612x647801 ◽

2012 ◽

Vol 25 (0) ◽

pp. 148

Author(s):

Marcia Grabowecky ◽

Emmanuel Guzman-Martinez ◽

Laura Ortega ◽

Satoru Suzuki

Keyword(s):

Speech Perception ◽

Visual Information ◽

Video Clip ◽

Visual Awareness ◽

Visual Display ◽

High Temporal Resolution ◽

Auditory Information ◽

Auditory Speech ◽

Word Categorization ◽

The Face

Watching moving lips facilitates auditory speech perception when the mouth is attended. However, recent evidence suggests that visual attention and awareness are mediated by separate mechanisms. We investigated whether lip movements suppressed from visual awareness can facilitate speech perception. We used a word categorization task in which participants listened to spoken words and determined as quickly and accurately as possible whether or not each word named a tool. While participants listened to the words they watched a visual display that presented a video clip of the speaker synchronously speaking the auditorily presented words, or the same speaker articulating different words. Critically, the speaker’s face was either visible (the aware trials), or suppressed from awareness using continuous flash suppression. Aware and suppressed trials were randomly intermixed. A secondary probe-detection task ensured that participants attended to the mouth region regardless of whether the face was visible or suppressed. On the aware trials responses to the tool targets were no faster with the synchronous than asynchronous lip movements, perhaps because the visual information was inconsistent with the auditory information on 50% of the trials. However, on the suppressed trials responses to the tool targets were significantly faster with the synchronous than asynchronous lip movements. These results demonstrate that even when a random dynamic mask renders a face invisible, lip movements are processed by the visual system with sufficiently high temporal resolution to facilitate speech perception.

Download Full-text

Audiovisual Speech Perception: Acoustic and Visual Phonetic Features Contributing to the McGurk Effect

i-Perception ◽

10.1068/ic768 ◽

2011 ◽

Vol 2 (8) ◽

pp. 768-768

Author(s):

Kaisa Tiippana ◽

Martti Vainio ◽

Mikko Tiainen

Keyword(s):

Speech Perception ◽

Mcgurk Effect ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Phonetic Features

Download Full-text

Merging auditory and visual phonetic information: A critical test for feedback?

Behavioral and Brain Sciences ◽

10.1017/s0140525x00243240 ◽

2000 ◽

Vol 23 (3) ◽

pp. 327-328 ◽

Cited By ~ 1

Author(s):

Lawrence Brancazio ◽

Carol A. Fowler

Keyword(s):

Speech Perception ◽

Visual Information ◽

Audiovisual Speech ◽

Critical Test ◽

Audiovisual Speech Perception ◽

Phonetic Information ◽

Present Description

The present description of the Merge model addresses only auditory, not audiovisual, speech perception. However, recent findings in the audiovisual domain are relevant to the model. We outline a test that we are conducting of the adequacy of Merge, modified to accept visual information about articulation.

Download Full-text