scholarly journals Auditory and semantic cues facilitate decoding of visual object category in MEG

2019 ◽  
Author(s):  
Talia Brandman ◽  
Chiara Avancini ◽  
Olga Leticevscaia ◽  
Marius V. Peelen

AbstractSounds (e.g., barking) help us to visually identify objects (e.g., a dog) that are distant or ambiguous. While neuroimaging studies have revealed neuroanatomical sites of audiovisual interactions, little is known about the time-course by which sounds facilitate visual object processing. Here we used magnetoencephalography (MEG) to reveal the time-course of the facilitatory influence of natural sounds (e.g., barking) on visual object processing, and compared this to the facilitatory influence of spoken words (e.g., “dog”). Participants viewed images of blurred objects preceded by a task-irrelevant natural sound, a spoken word, or uninformative noise. A classifier was trained to discriminate multivariate sensor patterns evoked by animate and inanimate intact objects with no sounds, presented in a separate experiment, and tested on sensor patterns evoked by the blurred objects in the three auditory conditions. Results revealed that both sounds and words, relative to uninformative noise, significantly facilitated visual object category decoding between 300-500 ms after visual onset. We found no evidence for earlier facilitation by sounds than by words. These findings provide evidence for a semantic route of facilitation by both natural sounds and spoken words, whereby the auditory input first activates semantic object representations, which then modulate the visual processing of objects.

2021 ◽  
Author(s):  
Piermatteo Morucci ◽  
Francesco Giannelli ◽  
Craig Richter ◽  
Nicola Molinaro

Hearing spoken words can enhance visual object recognition, detection and discrimination. Yet, the mechanism underlying this facilitation is incompletely understood. On one account, words would not bias visual processes at early levels, but rather interact at later decision-making stages. More recent proposals posit that words can alter visual processes at early stages by activating category-specific priors in sensory regions. A prediction of this account is that top-down priors evoke changes in occipital areas before the presentation of visual stimuli. Here, we tested the hypothesis that neural oscillations can serve as a mechanism to activate language-mediated visual priors. Participants performed a cue-picture matching task where cues were either spoken words, in their native or second language, or natural sounds, while EEG and reaction times were recorded. Behaviorally, we replicated the previously reported label-advantage effect, with images cued by words being recognized faster than those cued by natural sounds. A time-frequency analysis of cue-target intervals revealed that this label-advantage was associated with enhanced power in posterior alpha (9-11 Hz) and beta oscillations (17-19 Hz), both of which were larger when the image was preceded by a word compared to a natural sound. Prestimulus alpha and beta rhythms were correlated with reaction time performance, yet they appeared to operate in different ways. Reaction times were faster when alpha power increased, but slowed down with enhancement of beta oscillations. These results suggest that alpha and beta rhythms work in tandem to support language-mediated visual object recognition, while showing an inverse relationship to behavioral performance.


2012 ◽  
Vol 25 (0) ◽  
pp. 117 ◽  
Author(s):  
Yi-Chuan Chen ◽  
Gert Westermann

Infants are able to learn novel associations between visual objects and auditory linguistic labels (such as a dog and the sound /dɔg/) by the end of their first year of life. Surprisingly, at this age they seem to fail to learn the associations between visual objects and natural sounds (such as a dog and its barking sound). Researchers have therefore suggested that linguistic learning is special (Fulkerson and Waxman, 2007) or that unfamiliar sounds overshadow visual object processing (Robinson and Sloutsky, 2010). However, in previous studies visual stimuli were paired with arbitrary sounds in contexts lacking ecological validity. In the present study, we create animations of two novel animals and two realistic animal calls to construct two audiovisual stimuli. In the training phase, each animal was presented in motions that mimicked animal behaviour in real life: in a short movie, the animal ran (or jumped) from the periphery to the center of the monitor, and it made calls while raising its head. In the test phase, static images of both animals were presented side-by-side and the sound for one of the animals was played. Infant looking times to each stimulus were recorded with an eye tracker. We found that following the sound, 12-month-old infants preferentially looked at the animal corresponding to the sound. These results show that 12-month-old infants are able to learn novel associations between visual objects and natural sounds in an ecologically valid situation, thereby challenging our current understanding of the development of crossmodal association learning.


2020 ◽  
Author(s):  
Ali Almasi ◽  
Hamish Meffin ◽  
Shaun L. Cloherty ◽  
Yan Wong ◽  
Molis Yunzab ◽  
...  

AbstractVisual object identification requires both selectivity for specific visual features that are important to the object’s identity and invariance to feature manipulations. For example, a hand can be shifted in position, rotated, or contracted but still be recognised as a hand. How are the competing requirements of selectivity and invariance built into the early stages of visual processing? Typically, cells in the primary visual cortex are classified as either simple or complex. They both show selectivity for edge-orientation but complex cells develop invariance to edge position within the receptive field (spatial phase). Using a data-driven model that extracts the spatial structures and nonlinearities associated with neuronal computation, we show that the balance between selectivity and invariance in complex cells is more diverse than thought. Phase invariance is frequently partial, thus retaining sensitivity to brightness polarity, while invariance to orientation and spatial frequency are more extensive than expected. The invariance arises due to two independent factors: (1) the structure and number of filters and (2) the form of nonlinearities that act upon the filter outputs. Both vary more than previously considered, so primary visual cortex forms an elaborate set of generic feature sensitivities, providing the foundation for more sophisticated object processing.


2020 ◽  
Vol 30 (9) ◽  
pp. 5067-5087
Author(s):  
Ali Almasi ◽  
Hamish Meffin ◽  
Shaun L Cloherty ◽  
Yan Wong ◽  
Molis Yunzab ◽  
...  

Abstract Visual object identification requires both selectivity for specific visual features that are important to the object’s identity and invariance to feature manipulations. For example, a hand can be shifted in position, rotated, or contracted but still be recognized as a hand. How are the competing requirements of selectivity and invariance built into the early stages of visual processing? Typically, cells in the primary visual cortex are classified as either simple or complex. They both show selectivity for edge-orientation but complex cells develop invariance to edge position within the receptive field (spatial phase). Using a data-driven model that extracts the spatial structures and nonlinearities associated with neuronal computation, we quantitatively describe the balance between selectivity and invariance in complex cells. Phase invariance is frequently partial, while invariance to orientation and spatial frequency are more extensive than expected. The invariance arises due to two independent factors: (1) the structure and number of filters and (2) the form of nonlinearities that act upon the filter outputs. Both vary more than previously considered, so primary visual cortex forms an elaborate set of generic feature sensitivities, providing the foundation for more sophisticated object processing.


2014 ◽  
Vol 111 (10) ◽  
pp. E962-E971 ◽  
Author(s):  
Assaf Harel ◽  
Dwight J. Kravitz ◽  
Chris I. Baker

Perception reflects an integration of “bottom-up” (sensory-driven) and “top-down” (internally generated) signals. Although models of visual processing often emphasize the central role of feed-forward hierarchical processing, less is known about the impact of top-down signals on complex visual representations. Here, we investigated whether and how the observer’s goals modulate object processing across the cortex. We examined responses elicited by a diverse set of objects under six distinct tasks, focusing on either physical (e.g., color) or conceptual properties (e.g., man-made). Critically, the same stimuli were presented in all tasks, allowing us to investigate how task impacts the neural representations of identical visual input. We found that task has an extensive and differential impact on object processing across the cortex. First, we found task-dependent representations in the ventral temporal and prefrontal cortex. In particular, although object identity could be decoded from the multivoxel response within task, there was a significant reduction in decoding across tasks. In contrast, the early visual cortex evidenced equivalent decoding within and across tasks, indicating task-independent representations. Second, task information was pervasive and present from the earliest stages of object processing. However, although the responses of the ventral temporal, prefrontal, and parietal cortex enabled decoding of both the type of task (physical/conceptual) and the specific task (e.g., color), the early visual cortex was not sensitive to type of task and could only be used to decode individual physical tasks. Thus, object processing is highly influenced by the behavioral goal of the observer, highlighting how top-down signals constrain and inform the formation of visual representations.


2019 ◽  
Vol 31 (1) ◽  
pp. 49-63 ◽  
Author(s):  
Maryam Vaziri-Pashkam ◽  
JohnMark Taylor ◽  
Yaoda Xu

Primate ventral and dorsal visual pathways both contain visual object representations. Dorsal regions receive more input from magnocellular system while ventral regions receive inputs from both magnocellular and parvocellular systems. Due to potential differences in the spatial sensitivites of manocellular and parvocellular systems, object representations in ventral and dorsal regions may differ in how they represent visual input from different spatial scales. To test this prediction, we asked observers to view blocks of images from six object categories, shown in full spectrum, high spatial frequency (SF), or low SF. We found robust object category decoding in all SF conditions as well as SF decoding in nearly all the early visual, ventral, and dorsal regions examined. Cross-SF decoding further revealed that object category representations in all regions exhibited substantial tolerance across the SF components. No difference between ventral and dorsal regions was found in their preference for the different SF components. Further comparisons revealed that, whereas differences in the SF component separated object category representations in early visual areas, such a separation was much smaller in downstream ventral and dorsal regions. In those regions, variations among the object categories played a more significant role in shaping the visual representational structures. Our findings show that ventral and dorsal regions are similar in how they represent visual input from different spatial scales and argue against a dissociation of these regions based on differential sensitivity to different SFs.


Author(s):  
Talia Brandman ◽  
Chiara Avancini ◽  
Olga Leticevscaia ◽  
Marius V Peelen

2021 ◽  
Author(s):  
Jamal Rodgers Williams ◽  
Yuri Markov ◽  
Natalia Tiurina ◽  
Viola Störmer

Visual object recognition in the real world is not performed in isolation, but is instead dependent on contextual information such as the visual scene an object is found in. And our perceptual experience is not just visual: objects generate specific and unique sounds which can readily predict which objects are outside of our field of view. Here, we test whether and how naturalistic sounds influence visual object processing and demonstrate that auditory information both accelerates visual information processing and modulates the perceptual representation of visual objects. Specifically, using a visual discrimination task and a novel set of ambiguous object stimuli, we find that naturalistic sounds shift visual representations towards the object features that match the sound (Exp. 1a- 1b). In a series of control experiments, we replicate the original effect and show that these effects are not driven by decision- or response biases (Exp. 2a-2b) and are not due to the high-level semantic content of sounds generating explicit expectations (Exp.3). Instead, these sound-induced effects on visual perception appear to be driven by the continuous integration of multisensory inputs during perception itself. Together, our results demonstrate that visual processing is shaped by auditory context which provides independent supplemental information about the entities we encounter in the world.


Sign in / Sign up

Export Citation Format

Share Document