New evidence for audiovisual speech scene analysis: Low level interaction between auditory streaming and visual cues in speech perception

Frédéric Berthommier; Jean-Luc Schwartz

doi:10.1121/1.4708215

New evidence for audiovisual speech scene analysis: Low level interaction between auditory streaming and visual cues in speech perception

The Journal of the Acoustical Society of America ◽

10.1121/1.4708215 ◽

2012 ◽

Vol 131 (4) ◽

pp. 3269-3269

Author(s):

Frédéric Berthommier ◽

Jean-Luc Schwartz

Keyword(s):

Speech Perception ◽

Visual Cues ◽

Scene Analysis ◽

Audiovisual Speech ◽

Auditory Streaming ◽

Low Level ◽

New Evidence

Download Full-text

Audiovisual speech perception: Moving beyond McGurk

10.31234/osf.io/6y8qw ◽

2019 ◽

Author(s):

Kristin J. Van Engen ◽

Avanti Dey ◽

Mitchell Sommers ◽

Jonathan E. Peelle

Keyword(s):

Speech Perception ◽

Visual Information ◽

Visual Cues ◽

Real Life ◽

Audiovisual Integration ◽

Mcgurk Effect ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Combined Use ◽

Everyday Communication

Although listeners use both auditory and visual cues during speech perception, the cognitive and neural bases for their integration remain a matter of debate. One common approach to measuring multisensory integration is to use McGurk tasks, in which discrepant auditory and visual cues produce auditory percepts that differ from those based solely on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences in susceptibility are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we argue that McGurk tasks are ill-suited for studying the kind of multisensory speech perception that occurs in real life: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility on McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication.

Download Full-text

The contribution of dynamic visual cues to audiovisual speech perception

Neuropsychologia ◽

10.1016/j.neuropsychologia.2015.06.025 ◽

2015 ◽

Vol 75 ◽

pp. 402-410 ◽

Cited By ~ 8

Author(s):

Philip Jaekl ◽

Ana Pesquita ◽

Agnes Alsius ◽

Kevin Munhall ◽

Salvador Soto-Faraco

Keyword(s):

Speech Perception ◽

Visual Cues ◽

Audiovisual Speech ◽

Audiovisual Speech Perception

Download Full-text

An Eye-Tracking Study on Audiovisual Speech Perception Strategies Adopted by Normal-Hearing and Deaf Adults Under Different Language Familiarities

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-19-00223 ◽

2020 ◽

Vol 63 (7) ◽

pp. 2245-2254 ◽

Cited By ~ 1

Author(s):

Jianrong Wang ◽

Yumeng Zhu ◽

Yu Chen ◽

Abdilbar Mamat ◽

Mei Yu ◽

...

Keyword(s):

Speech Perception ◽

Eye Tracking ◽

Normal Hearing ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Allocation Pattern ◽

Deaf Adults ◽

Bilingual Speaker ◽

Primary Hypothesis ◽

Standard Chinese

Purpose The primary purpose of this study was to explore the audiovisual speech perception strategies.80.23.47 adopted by normal-hearing and deaf people in processing familiar and unfamiliar languages. Our primary hypothesis was that they would adopt different perception strategies due to different sensory experiences at an early age, limitations of the physical device, and the developmental gap of language, and others. Method Thirty normal-hearing adults and 33 prelingually deaf adults participated in the study. They were asked to perform judgment and listening tasks while watching videos of a Uygur–Mandarin bilingual speaker in a familiar language (Standard Chinese) or an unfamiliar language (Modern Uygur) while their eye movements were recorded by eye-tracking technology. Results Task had a slight influence on the distribution of selective attention, whereas subject and language had significant influences. To be specific, the normal-hearing and the d10eaf participants mainly gazed at the speaker's eyes and mouth, respectively, in the experiment; moreover, while the normal-hearing participants had to stare longer at the speaker's mouth when they confronted with the unfamiliar language Modern Uygur, the deaf participant did not change their attention allocation pattern when perceiving the two languages. Conclusions Normal-hearing and deaf adults adopt different audiovisual speech perception strategies: Normal-hearing adults mainly look at the eyes, and deaf adults mainly look at the mouth. Additionally, language and task can also modulate the speech perception strategy.

Download Full-text

Supplemental Material for Age-Related Differences in Inhibitory Control Predict Audiovisual Speech Perception

Psychology and Aging ◽

10.1037/pag0000033.supp ◽

2015 ◽

Keyword(s):

Speech Perception ◽

Inhibitory Control ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Age Related

Download Full-text

Lexical Influences in Audiovisual Speech Perception.

Journal of Experimental Psychology Human Perception & Performance ◽

10.1037/0096-1523.30.3.445 ◽

2004 ◽

Vol 30 (3) ◽

pp. 445-463 ◽

Cited By ~ 33

Author(s):

Lawrence Brancazio

Keyword(s):

Speech Perception ◽

Audiovisual Speech ◽

Audiovisual Speech Perception

Download Full-text

Auditory and Visual Lexical Neighborhoods in Audiovisual Speech Perception

Trends in Amplification ◽

10.1177/1084713807307409 ◽

2007 ◽

Vol 11 (4) ◽

pp. 233-241 ◽

Cited By ~ 27

Author(s):

Nancy Tye-Murray ◽

Mitchell Sommers ◽

Brent Spehar

Keyword(s):

Speech Perception ◽

Audiovisual Speech ◽

Audiovisual Speech Perception ◽

Lexical Neighborhoods

Download Full-text

Real-Time Automatic Detection of Violent-Acts by Low-Level Colour Visual Cues

2007 IEEE International Conference on Image Processing ◽

10.1109/icip.2007.4378962 ◽

2007 ◽

Cited By ~ 4

Author(s):

Alessandro Mecocci ◽

Francesco Micheli

Keyword(s):

Real Time ◽

Visual Cues ◽

Automatic Detection ◽

Low Level ◽

Violent Acts

Download Full-text

Audiovisual speech perception and language acquisition in preterm infants: A longitudinal study

Early Human Development ◽

10.1016/j.earlhumdev.2018.11.001 ◽

2019 ◽

Vol 128 ◽

pp. 93-100 ◽

Cited By ~ 1

Author(s):

Masahiro Imafuku ◽

Masahiko Kawai ◽

Fusako Niwa ◽

Yuta Shinya ◽

Masako Myowa

Keyword(s):

Longitudinal Study ◽

Speech Perception ◽

Language Acquisition ◽

Preterm Infants ◽

Audiovisual Speech ◽

Audiovisual Speech Perception

Download Full-text

The initial phase of auditory and visual scene analysis

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2011.0368 ◽

2012 ◽

Vol 367 (1591) ◽

pp. 942-953 ◽

Cited By ~ 21

Author(s):

Jean-Michel Hupé ◽

Daniel Pressnitzer

Keyword(s):

Perceptual Organization ◽

Initial Phase ◽

Stimulus Presentation ◽

Visual Scene ◽

Scene Analysis ◽

Systematic Bias ◽

Auditory Streaming ◽

Neurophysiological Data ◽

Presentation Organization ◽

Over Time

Auditory streaming and visual plaids have been used extensively to study perceptual organization in each modality. Both stimuli can produce bistable alternations between grouped (one object) and split (two objects) interpretations. They also share two peculiar features: (i) at the onset of stimulus presentation, organization starts with a systematic bias towards the grouped interpretation; (ii) this first percept has ‘inertia’; it lasts longer than the subsequent ones. As a result, the probability of forming different objects builds up over time, a landmark of both behavioural and neurophysiological data on auditory streaming. Here we show that first percept bias and inertia are independent. In plaid perception, inertia is due to a depth ordering ambiguity in the transparent (split) interpretation that makes plaid perception tristable rather than bistable: experimental manipulations removing the depth ambiguity suppressed inertia. However, the first percept bias persisted. We attempted a similar manipulation for auditory streaming by introducing level differences between streams, to bias which stream would appear in the perceptual foreground. Here both inertia and first percept bias persisted. We thus argue that the critical common feature of the onset of perceptual organization is the grouping bias, which may be related to the transition from temporally/spatially local to temporally/spatially global computation.

Download Full-text

Auditory localization and audiovisual speech perception

The Journal of the Acoustical Society of America ◽

10.1121/1.411552 ◽

1995 ◽

Vol 97 (5) ◽

pp. 3286-3286

Author(s):

J. A. Jones ◽

K. G. Munhall

Keyword(s):

Speech Perception ◽

Auditory Localization ◽

Audiovisual Speech ◽

Audiovisual Speech Perception

Download Full-text