scholarly journals Object Recognition at Higher Regions of the Ventral Visual Stream via Dynamic Inference

2020 ◽  
Vol 14 ◽  
Author(s):  
Siamak K. Sorooshyari ◽  
Huanjie Sheng ◽  
H. Vincent Poor
2000 ◽  
Vol 12 (4) ◽  
pp. 615-621 ◽  
Author(s):  
Glen M. Doniger ◽  
John J. Foxe ◽  
Micah M. Murray ◽  
Beth A. Higgins ◽  
Joan Gay Snodgrass ◽  
...  

Object recognition is achieved even in circumstances when only partial information is available to the observer. Perceptual closure processes are essential in enabling such recognitions to occur. We presented successively less fragmented images while recording high-density event-related potentials (ERPs), which permitted us to monitor brain activity during the perceptual closure processes leading up to object recognition. We reveal a bilateral ERP component (Ncl) that tracks these processes (onsets ∼ 230 msec, maximal at ∼290 msec). Scalp-current density mapping of the Ncl revealed bilateral occipito-temporal scalp foci, which are consistent with generators in the human ventral visual stream, and specifically the lateral-occipital or LO complex as defined by hemodynamic studies of object recognition.


2021 ◽  
Author(s):  
Moritz Wurm ◽  
Alfonso Caramazza

The ventral visual stream is conceived as a pathway for object recognition. However, we also recognize the actions an object can be involved in. Here, we show that action recognition relies on a pathway in lateral occipitotemporal cortex, partially overlapping and topographically aligned with object representations that are precursors for action recognition. By contrast, object features that are more relevant for object recognition, such as color and texture, are restricted to medial areas of the ventral stream. We argue that the ventral stream bifurcates into lateral and medial pathways for action and object recognition, respectively. This account explains a number of observed phenomena, such as the duplication of object domains and the specific representational profiles in lateral and medial areas.


2018 ◽  
Author(s):  
Jonas Kubilius ◽  
Martin Schrimpf ◽  
Aran Nayebi ◽  
Daniel Bear ◽  
Daniel L. K. Yamins ◽  
...  

AbstractDeep artificial neural networks with spatially repeated processing (a.k.a., deep convolutional ANNs) have been established as the best class of candidate models of visual processing in primate ventral visual processing stream. Over the past five years, these ANNs have evolved from a simple feedforward eight-layer architecture in AlexNet to extremely deep and branching NAS-Net architectures, demonstrating increasingly better object categorization performance and increasingly better explanatory power of both neural and behavioral responses. However, from the neuroscientist’s point of view, the relationship between such very deep architectures and the ventral visual pathway is incomplete in at least two ways. On the one hand, current state-of-the-art ANNs appear to be too complex (e.g., now over 100 levels) compared with the relatively shallow cortical hierarchy (4-8 levels), which makes it difficult to map their elements to those in the ventral visual stream and to understand what they are doing. On the other hand, current state-of-the-art ANNs appear to be not complex enough in that they lack recurrent connections and the resulting neural response dynamics that are commonplace in the ventral visual stream. Here we describe our ongoing efforts to resolve both of these issues by developing a “CORnet” family of deep neural network architectures. Rather than just seeking high object recognition performance (as the state-of-the-art ANNs above), we instead try to reduce the model family to its most important elements and then gradually build new ANNs with recurrent and skip connections while monitoring both performance and the match between each new CORnet model and a large body of primate brain and behavioral data. We report here that our current best ANN model derived from this approach (CORnet-S) is among the top models on Brain-Score, a composite benchmark for comparing models to the brain, but is simpler than other deep ANNs in terms of the number of convolutions performed along the longest path of information processing in the model. All CORnet models are available at github.com/dicarlolab/CORnet, and we plan to up-date this manuscript and the available models in this family as they are produced.


2021 ◽  
Author(s):  
Aran Nayebi ◽  
Javier Sagastuy-Brena ◽  
Daniel M. Bear ◽  
Kohitij Kar ◽  
Jonas Kubilius ◽  
...  

The ventral visual stream (VVS) is a hierarchically connected series of cortical areas known to underlie core object recognition behaviors, enabling humans and non-human primates to effortlessly recognize objects across a multitude of viewing conditions. While recent feedforward convolutional neural networks (CNNs) provide quantitatively accurate predictions of temporally-averaged neural responses throughout the ventral pathway, they lack two ubiquitous neuroanatomical features: local recurrence within cortical areas and long-range feedback from downstream areas to upstream areas. As a result, such models are unable to account for the temporally-varying dynamical patterns thought to arise from recurrent visual circuits, nor can they provide insight into the behavioral goals that these recurrent circuits might help support. In this work, we augment CNNs with local recurrence and long-range feedback, developing convolutional RNN (ConvRNN) network models that more correctly mimic the gross neuroanatomy of the ventral pathway. Moreover, when the form of the recurrent circuit is chosen properly, ConvRNNs with comparatively small numbers of layers can achieve high performance on a core recognition task, comparable to that of much deeper feedforward networks. We then compared these models to temporally fine-grained neural and behavioral recordings from primates to thousands of images. We found that ConvRNNs better matched these data than alternative models, including the deepest feedforward networks, on two metrics: 1) neural dynamics in V4 and inferotemporal (IT) cortex at late timepoints after stimulus onset, and 2) the varying times at which object identity can be decoded from IT, including more challenging images that take longer to decode. Moreover, these results differentiate within the class of ConvRNNs, suggesting that there are strong functional constraints on the recurrent connectivity needed to match these phenomena. Finally, we find that recurrent circuits that attain high task performance while having a smaller network size as measured by number of units, rather than another metric such as the number of parameters, are overall most consistent with these data. Taken together, our results evince the role of recurrence and feedback in the ventral pathway to reliably perform core object recognition while subject to a strong total network size constraint.


2014 ◽  
Vol 111 (1) ◽  
pp. 91-102 ◽  
Author(s):  
Leyla Isik ◽  
Ethan M. Meyers ◽  
Joel Z. Leibo ◽  
Tomaso Poggio

The human visual system can rapidly recognize objects despite transformations that alter their appearance. The precise timing of when the brain computes neural representations that are invariant to particular transformations, however, has not been mapped in humans. Here we employ magnetoencephalography decoding analysis to measure the dynamics of size- and position-invariant visual information development in the ventral visual stream. With this method we can read out the identity of objects beginning as early as 60 ms. Size- and position-invariant visual information appear around 125 ms and 150 ms, respectively, and both develop in stages, with invariance to smaller transformations arising before invariance to larger transformations. Additionally, the magnetoencephalography sensor activity localizes to neural sources that are in the most posterior occipital regions at the early decoding times and then move temporally as invariant information develops. These results provide previously unknown latencies for key stages of human-invariant object recognition, as well as new and compelling evidence for a feed-forward hierarchical model of invariant object recognition where invariance increases at each successive visual area along the ventral stream.


2019 ◽  
Author(s):  
Sushrut Thorat

A mediolateral gradation in neural responses for images spanning animals to artificial objects is observed in the ventral temporal cortex (VTC). Which information streams drive this organisation is an ongoing debate. Recently, in Proklova et al. (2016), the visual shape and category (“animacy”) dimensions in a set of stimuli were dissociated using a behavioural measure of visual feature information. fMRI responses revealed a neural cluster (extra-visual animacy cluster - xVAC) which encoded category information unexplained by visual feature information, suggesting extra-visual contributions to the organisation in the ventral visual stream. We reassess these findings using Convolutional Neural Networks (CNNs) as models for the ventral visual stream. The visual features developed in the CNN layers can categorise the shape-matched stimuli from Proklova et al. (2016) in contrast to the behavioural measures used in the study. The category organisations in xVAC and VTC are explained to a large degree by the CNN visual feature differences, casting doubt over the suggestion that visual feature differences cannot account for the animacy organisation. To inform the debate further, we designed a set of stimuli with animal images to dissociate the animacy organisation driven by the CNN visual features from the degree of familiarity and agency (thoughtfulness and feelings). Preliminary results from a new fMRI experiment designed to understand the contribution of these non-visual features are presented.


NeuroImage ◽  
2016 ◽  
Vol 128 ◽  
pp. 316-327 ◽  
Author(s):  
Marianna Boros ◽  
Jean-Luc Anton ◽  
Catherine Pech-Georgel ◽  
Jonathan Grainger ◽  
Marcin Szwed ◽  
...  

2018 ◽  
Author(s):  
Simona Monaco ◽  
Giulia Malfatti ◽  
Alessandro Zendron ◽  
Elisa Pellencin ◽  
Luca Turella

AbstractPredictions of upcoming movements are based on several types of neural signals that span the visual, somatosensory, motor and cognitive system. Thus far, pre-movement signals have been investigated while participants viewed the object to be acted upon. Here, we studied the contribution of information other than vision to the classification of preparatory signals for action, even in absence of online visual information. We used functional magnetic resonance imaging (fMRI) and multivoxel pattern analysis (MVPA) to test whether the neural signals evoked by visual, memory-based and somato-motor information can be reliably used to predict upcoming actions in areas of the dorsal and ventral visual stream during the preparatory phase preceding the action, while participants were lying still. Nineteen human participants (nine women) performed one of two actions towards an object with their eyes open or closed. Despite the well-known role of ventral stream areas in visual recognition tasks and the specialization of dorsal stream areas in somato-motor processes, we decoded action intention in areas of both streams based on visual, memory-based and somato-motor signals. Interestingly, we could reliably decode action intention in absence of visual information based on neural activity evoked when visual information was available, and vice-versa. Our results show a similar visual, memory and somato-motor representation of action planning in dorsal and ventral visual stream areas that allows predicting action intention across domains, regardless of the availability of visual information.


2014 ◽  
Vol 14 (10) ◽  
pp. 985-985
Author(s):  
R. Lafer-Sousa ◽  
A. Kell ◽  
A. Takahashi ◽  
J. Feather ◽  
B. Conway ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document