Object Recognition at Higher Regions of the Ventral Visual Stream via Dynamic Inference

Activation Timecourse of Ventral Visual Stream Object-recognition Areas: High Density Electrical Mapping of Perceptual Closure Processes

Journal of Cognitive Neuroscience ◽

10.1162/089892900562372 ◽

2000 ◽

Vol 12 (4) ◽

pp. 615-621 ◽

Cited By ~ 161

Author(s):

Glen M. Doniger ◽

John J. Foxe ◽

Micah M. Murray ◽

Beth A. Higgins ◽

Joan Gay Snodgrass ◽

...

Keyword(s):

Object Recognition ◽

Partial Information ◽

Brain Activity ◽

Event Related Potentials ◽

High Density ◽

Visual Stream ◽

Density Mapping ◽

Related Potentials ◽

Ventral Visual Stream ◽

Perceptual Closure

Object recognition is achieved even in circumstances when only partial information is available to the observer. Perceptual closure processes are essential in enabling such recognitions to occur. We presented successively less fragmented images while recording high-density event-related potentials (ERPs), which permitted us to monitor brain activity during the perceptual closure processes leading up to object recognition. We reveal a bilateral ERP component (Ncl) that tracks these processes (onsets ∼ 230 msec, maximal at ∼290 msec). Scalp-current density mapping of the Ncl revealed bilateral occipito-temporal scalp foci, which are consistent with generators in the human ventral visual stream, and specifically the lateral-occipital or LO complex as defined by hemodynamic studies of object recognition.

Download Full-text

Action and object representation in the ventral "what" stream

10.31234/osf.io/af65s ◽

2021 ◽

Author(s):

Moritz Wurm ◽

Alfonso Caramazza

Keyword(s):

Object Recognition ◽

Action Recognition ◽

Object Representation ◽

Ventral Stream ◽

Object Representations ◽

Visual Stream ◽

Ventral Visual Stream ◽

Occipitotemporal Cortex ◽

Object Features

The ventral visual stream is conceived as a pathway for object recognition. However, we also recognize the actions an object can be involved in. Here, we show that action recognition relies on a pathway in lateral occipitotemporal cortex, partially overlapping and topographically aligned with object representations that are precursors for action recognition. By contrast, object features that are more relevant for object recognition, such as color and texture, are restricted to medial areas of the ventral stream. We argue that the ventral stream bifurcates into lateral and medial pathways for action and object recognition, respectively. This account explains a number of observed phenomena, such as the duplication of object domains and the specific representational profiles in lateral and medial areas.

Download Full-text

CORnet: Modeling the Neural Mechanisms of Core Object Recognition

10.1101/408385 ◽

2018 ◽

Cited By ~ 26

Author(s):

Jonas Kubilius ◽

Martin Schrimpf ◽

Aran Nayebi ◽

Daniel Bear ◽

Daniel L. K. Yamins ◽

...

Keyword(s):

Object Recognition ◽

Visual Processing ◽

State Of The Art ◽

Recognition Performance ◽

Explanatory Power ◽

Large Body ◽

Response Dynamics ◽

Visual Stream ◽

Current State ◽

Ventral Visual Stream

AbstractDeep artificial neural networks with spatially repeated processing (a.k.a., deep convolutional ANNs) have been established as the best class of candidate models of visual processing in primate ventral visual processing stream. Over the past five years, these ANNs have evolved from a simple feedforward eight-layer architecture in AlexNet to extremely deep and branching NAS-Net architectures, demonstrating increasingly better object categorization performance and increasingly better explanatory power of both neural and behavioral responses. However, from the neuroscientist’s point of view, the relationship between such very deep architectures and the ventral visual pathway is incomplete in at least two ways. On the one hand, current state-of-the-art ANNs appear to be too complex (e.g., now over 100 levels) compared with the relatively shallow cortical hierarchy (4-8 levels), which makes it difficult to map their elements to those in the ventral visual stream and to understand what they are doing. On the other hand, current state-of-the-art ANNs appear to be not complex enough in that they lack recurrent connections and the resulting neural response dynamics that are commonplace in the ventral visual stream. Here we describe our ongoing efforts to resolve both of these issues by developing a “CORnet” family of deep neural network architectures. Rather than just seeking high object recognition performance (as the state-of-the-art ANNs above), we instead try to reduce the model family to its most important elements and then gradually build new ANNs with recurrent and skip connections while monitoring both performance and the match between each new CORnet model and a large body of primate brain and behavioral data. We report here that our current best ANN model derived from this approach (CORnet-S) is among the top models on Brain-Score, a composite benchmark for comparing models to the brain, but is simpler than other deep ANNs in terms of the number of convolutions performed along the longest path of information processing in the model. All CORnet models are available at github.com/dicarlolab/CORnet, and we plan to up-date this manuscript and the available models in this family as they are produced.

Download Full-text

Goal-Driven Recurrent Neural Network Models of the Ventral Visual Stream

10.1101/2021.02.17.431717 ◽

2021 ◽

Author(s):

Aran Nayebi ◽

Javier Sagastuy-Brena ◽

Daniel M. Bear ◽

Kohitij Kar ◽

Jonas Kubilius ◽

...

Keyword(s):

Object Recognition ◽

Local Recurrence ◽

Long Range ◽

Network Models ◽

Network Size ◽

Feedforward Networks ◽

Visual Stream ◽

Cortical Areas ◽

Ventral Pathway ◽

Ventral Visual Stream

The ventral visual stream (VVS) is a hierarchically connected series of cortical areas known to underlie core object recognition behaviors, enabling humans and non-human primates to effortlessly recognize objects across a multitude of viewing conditions. While recent feedforward convolutional neural networks (CNNs) provide quantitatively accurate predictions of temporally-averaged neural responses throughout the ventral pathway, they lack two ubiquitous neuroanatomical features: local recurrence within cortical areas and long-range feedback from downstream areas to upstream areas. As a result, such models are unable to account for the temporally-varying dynamical patterns thought to arise from recurrent visual circuits, nor can they provide insight into the behavioral goals that these recurrent circuits might help support. In this work, we augment CNNs with local recurrence and long-range feedback, developing convolutional RNN (ConvRNN) network models that more correctly mimic the gross neuroanatomy of the ventral pathway. Moreover, when the form of the recurrent circuit is chosen properly, ConvRNNs with comparatively small numbers of layers can achieve high performance on a core recognition task, comparable to that of much deeper feedforward networks. We then compared these models to temporally fine-grained neural and behavioral recordings from primates to thousands of images. We found that ConvRNNs better matched these data than alternative models, including the deepest feedforward networks, on two metrics: 1) neural dynamics in V4 and inferotemporal (IT) cortex at late timepoints after stimulus onset, and 2) the varying times at which object identity can be decoded from IT, including more challenging images that take longer to decode. Moreover, these results differentiate within the class of ConvRNNs, suggesting that there are strong functional constraints on the recurrent connectivity needed to match these phenomena. Finally, we find that recurrent circuits that attain high task performance while having a smaller network size as measured by number of units, rather than another metric such as the number of parameters, are overall most consistent with these data. Taken together, our results evince the role of recurrence and feedback in the ventral pathway to reliably perform core object recognition while subject to a strong total network size constraint.

Download Full-text

The dynamics of invariant object recognition in the human visual system

Journal of Neurophysiology ◽

10.1152/jn.00394.2013 ◽

2014 ◽

Vol 111 (1) ◽

pp. 91-102 ◽

Cited By ~ 135

Author(s):

Leyla Isik ◽

Ethan M. Meyers ◽

Joel Z. Leibo ◽

Tomaso Poggio

Keyword(s):

Object Recognition ◽

Visual System ◽

Human Visual System ◽

Visual Information ◽

Compelling Evidence ◽

Visual Stream ◽

Neural Representations ◽

Invariant Object Recognition ◽

Ventral Visual Stream ◽

The Brain

The human visual system can rapidly recognize objects despite transformations that alter their appearance. The precise timing of when the brain computes neural representations that are invariant to particular transformations, however, has not been mapped in humans. Here we employ magnetoencephalography decoding analysis to measure the dynamics of size- and position-invariant visual information development in the ventral visual stream. With this method we can read out the identity of objects beginning as early as 60 ms. Size- and position-invariant visual information appear around 125 ms and 150 ms, respectively, and both develop in stages, with invariance to smaller transformations arising before invariance to larger transformations. Additionally, the magnetoencephalography sensor activity localizes to neural sources that are in the most posterior occipital regions at the early decoding times and then move temporally as invariant information develops. These results provide previously unknown latencies for key stages of human-invariant object recognition, as well as new and compelling evidence for a feed-forward hierarchical model of invariant object recognition where invariance increases at each successive visual area along the ventral stream.

Download Full-text

Using Convolutional Neural Networks to measure the contribution of visual features to the representation of object animacy in the brain

10.31237/osf.io/fxz4q ◽

2019 ◽

Cited By ~ 1

Author(s):

Sushrut Thorat

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Temporal Cortex ◽

Large Degree ◽

Visual Features ◽

Visual Feature ◽

Visual Stream ◽

Feature Information ◽

Ventral Visual Stream ◽

Animal Images

A mediolateral gradation in neural responses for images spanning animals to artificial objects is observed in the ventral temporal cortex (VTC). Which information streams drive this organisation is an ongoing debate. Recently, in Proklova et al. (2016), the visual shape and category (“animacy”) dimensions in a set of stimuli were dissociated using a behavioural measure of visual feature information. fMRI responses revealed a neural cluster (extra-visual animacy cluster - xVAC) which encoded category information unexplained by visual feature information, suggesting extra-visual contributions to the organisation in the ventral visual stream. We reassess these findings using Convolutional Neural Networks (CNNs) as models for the ventral visual stream. The visual features developed in the CNN layers can categorise the shape-matched stimuli from Proklova et al. (2016) in contrast to the behavioural measures used in the study. The category organisations in xVAC and VTC are explained to a large degree by the CNN visual feature differences, casting doubt over the suggestion that visual feature differences cannot account for the animacy organisation. To inform the debate further, we designed a set of stimuli with animal images to dissociate the animacy organisation driven by the CNN visual features from the degree of familiarity and agency (thoughtfulness and feelings). Preliminary results from a new fMRI experiment designed to understand the contribution of these non-visual features are presented.

Download Full-text

Expectation and Surprise Determine Neural Population Responses in the Ventral Visual Stream

Journal of Neuroscience ◽

10.1523/jneurosci.2770-10.2010 ◽

2010 ◽

Vol 30 (49) ◽

pp. 16601-16608 ◽

Cited By ~ 185

Author(s):

T. Egner ◽

J. M. Monti ◽

C. Summerfield

Keyword(s):

Neural Population ◽

Visual Stream ◽

Population Responses ◽

Ventral Visual Stream

Download Full-text

Orthographic processing deficits in developmental dyslexia: Beyond the ventral visual stream

NeuroImage ◽

10.1016/j.neuroimage.2016.01.014 ◽

2016 ◽

Vol 128 ◽

pp. 316-327 ◽

Cited By ~ 32

Author(s):

Marianna Boros ◽

Jean-Luc Anton ◽

Catherine Pech-Georgel ◽

Jonathan Grainger ◽

Marcin Szwed ◽

...

Keyword(s):

Developmental Dyslexia ◽

Orthographic Processing ◽

Visual Stream ◽

Ventral Visual Stream ◽

Processing Deficits

Download Full-text

Predictive coding of action intentions in dorsal and ventral visual stream is based on visual anticipations, memory-based information and motor preparation

10.1101/480590 ◽

2018 ◽

Author(s):

Simona Monaco ◽

Giulia Malfatti ◽

Alessandro Zendron ◽

Elisa Pellencin ◽

Luca Turella

Keyword(s):

Visual Information ◽

Visual Memory ◽

Visual Recognition ◽

Predictive Coding ◽

Action Planning ◽

Neural Signals ◽

Motor Representation ◽

Visual Stream ◽

Ventral Visual Stream ◽

Action Intention

AbstractPredictions of upcoming movements are based on several types of neural signals that span the visual, somatosensory, motor and cognitive system. Thus far, pre-movement signals have been investigated while participants viewed the object to be acted upon. Here, we studied the contribution of information other than vision to the classification of preparatory signals for action, even in absence of online visual information. We used functional magnetic resonance imaging (fMRI) and multivoxel pattern analysis (MVPA) to test whether the neural signals evoked by visual, memory-based and somato-motor information can be reliably used to predict upcoming actions in areas of the dorsal and ventral visual stream during the preparatory phase preceding the action, while participants were lying still. Nineteen human participants (nine women) performed one of two actions towards an object with their eyes open or closed. Despite the well-known role of ventral stream areas in visual recognition tasks and the specialization of dorsal stream areas in somato-motor processes, we decoded action intention in areas of both streams based on visual, memory-based and somato-motor signals. Interestingly, we could reliably decode action intention in absence of visual information based on neural activity evoked when visual information was available, and vice-versa. Our results show a similar visual, memory and somato-motor representation of action planning in dorsal and ventral visual stream areas that allows predicting action intention across domains, regardless of the availability of visual information.

Download Full-text

Parallel processing of colors and faces in human ventral visual stream: functional evidence and technical challenges

Journal of Vision ◽

10.1167/14.10.985 ◽

2014 ◽

Vol 14 (10) ◽

pp. 985-985

Author(s):

R. Lafer-Sousa ◽

A. Kell ◽

A. Takahashi ◽

J. Feather ◽

B. Conway ◽

...

Keyword(s):

Parallel Processing ◽

Visual Stream ◽

Ventral Visual Stream ◽

Technical Challenges

Download Full-text