Organizing visual object knowledge by real-world size in ventral visual cortex

T. Konkle; A. Oliva

doi:10.1167/11.11.883

Visual search for object categories is predicted by the representational architecture of high-level visual cortex

Journal of Neurophysiology ◽

10.1152/jn.00569.2016 ◽

2017 ◽

Vol 117 (1) ◽

pp. 388-402 ◽

Cited By ~ 14

Author(s):

Michael A. Cohen ◽

George A. Alvarez ◽

Ken Nakayama ◽

Talia Konkle

Keyword(s):

Visual Cortex ◽

Visual Search ◽

Visual System ◽

Visual Processing ◽

Search Task ◽

Visual Search Task ◽

Visual Object ◽

Neural Responses ◽

Object Categories ◽

High Level

Visual search is a ubiquitous visual behavior, and efficient search is essential for survival. Different cognitive models have explained the speed and accuracy of search based either on the dynamics of attention or on similarity of item representations. Here, we examined the extent to which performance on a visual search task can be predicted from the stable representational architecture of the visual system, independent of attentional dynamics. Participants performed a visual search task with 28 conditions reflecting different pairs of categories (e.g., searching for a face among cars, body among hammers, etc.). The time it took participants to find the target item varied as a function of category combination. In a separate group of participants, we measured the neural responses to these object categories when items were presented in isolation. Using representational similarity analysis, we then examined whether the similarity of neural responses across different subdivisions of the visual system had the requisite structure needed to predict visual search performance. Overall, we found strong brain/behavior correlations across most of the higher-level visual system, including both the ventral and dorsal pathways when considering both macroscale sectors as well as smaller mesoscale regions. These results suggest that visual search for real-world object categories is well predicted by the stable, task-independent architecture of the visual system. NEW & NOTEWORTHY Here, we ask which neural regions have neural response patterns that correlate with behavioral performance in a visual processing task. We found that the representational structure across all of high-level visual cortex has the requisite structure to predict behavior. Furthermore, when directly comparing different neural regions, we found that they all had highly similar category-level representational structures. These results point to a ubiquitous and uniform representational structure in high-level visual cortex underlying visual object processing.

Download Full-text

Creatures great and small: Real-world size of animals predicts visual cortex representations beyond taxonomic category

NeuroImage ◽

10.1016/j.neuroimage.2018.08.066 ◽

2018 ◽

Vol 183 ◽

pp. 627-634 ◽

Cited By ~ 7

Author(s):

Marc N. Coutanche ◽

Griffin E. Koch

Keyword(s):

Visual Cortex ◽

Real World ◽

Taxonomic Category ◽

Size Of Animals

Download Full-text

Working Memory-Driven Attention in Real-World Search

Perception ◽

10.1177/0301006618791688 ◽

2018 ◽

Vol 47 (9) ◽

pp. 966-975 ◽

Cited By ~ 3

Author(s):

Shinyoung Jung ◽

Yosun Yoon ◽

Suk Won Han

Keyword(s):

Working Memory ◽

Attentional Capture ◽

Real World ◽

Target Object ◽

Visual Object ◽

Indoor Scenes ◽

Bias Attention ◽

Memory Contents ◽

Online Catalogue ◽

World Environment

People’s attention is well attracted to stimuli matching their working memory. This memory-driven attentional capture has been demonstrated in simplified and controlled laboratory settings. The present study investigated whether working memory contents capture attention in a setting that closely resembles real-world environment. In the experiment, participants performed a task of searching for a target object in real-world indoor scenes, while maintaining a visual object in working memory. To create a setting similar to real-world environment, images taken from IKEA®’s online catalogue were used. The results showed that participants’ attention was biased toward a working memory-matching object, interfering with the target search. This was so even when participants did not expect that a memory-matching stimulus would appear in the search array. These results suggest that working memory can bias attention in complex, natural environment and this memory-driven attentional capture in real-world setting takes place in an automatic manner.

Download Full-text

Disentangling Representations of Object Shape and Object Category in Human Visual Cortex: The Animate–Inanimate Distinction

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_00924 ◽

2016 ◽

Vol 28 (5) ◽

pp. 680-692 ◽

Cited By ~ 71

Author(s):

Daria Proklova ◽

Daniel Kaiser ◽

Marius V. Peelen

Keyword(s):

Visual Cortex ◽

Activity Patterns ◽

Temporal Cortex ◽

Brain Regions ◽

Visual Object ◽

Object Category ◽

Human Visual Cortex ◽

Object Properties ◽

Occipitotemporal Cortex ◽

Visual Properties

Objects belonging to different categories evoke reliably different fMRI activity patterns in human occipitotemporal cortex, with the most prominent distinction being that between animate and inanimate objects. An unresolved question is whether these categorical distinctions reflect category-associated visual properties of objects or whether they genuinely reflect object category. Here, we addressed this question by measuring fMRI responses to animate and inanimate objects that were closely matched for shape and low-level visual features. Univariate contrasts revealed animate- and inanimate-preferring regions in ventral and lateral temporal cortex even for individually matched object pairs (e.g., snake–rope). Using representational similarity analysis, we mapped out brain regions in which the pairwise dissimilarity of multivoxel activity patterns (neural dissimilarity) was predicted by the objects' pairwise visual dissimilarity and/or their categorical dissimilarity. Visual dissimilarity was measured as the time it took participants to find a unique target among identical distractors in three visual search experiments, where we separately quantified overall dissimilarity, outline dissimilarity, and texture dissimilarity. All three visual dissimilarity structures predicted neural dissimilarity in regions of visual cortex. Interestingly, these analyses revealed several clusters in which categorical dissimilarity predicted neural dissimilarity after regressing out visual dissimilarity. Together, these results suggest that the animate–inanimate organization of human visual cortex is not fully explained by differences in the characteristic shape or texture properties of animals and inanimate objects. Instead, representations of visual object properties and object category may coexist in more anterior parts of the visual system.

Download Full-text

VisGraB: A Benchmark for Vision-Based Grasping

Paladyn Journal of Behavioral Robotics ◽

10.2478/s13230-012-0020-5 ◽

2012 ◽

Vol 3 (2) ◽

Cited By ~ 14

Author(s):

Gert Kootstra ◽

Mila Popović ◽

Jimmy Alison Jørgensen ◽

Danica Kragic ◽

Henrik Gordon Petersen ◽

...

Keyword(s):

Real World ◽

Software Tool ◽

Quality Measures ◽

Visual Input ◽

User Needs ◽

Experimental Setup ◽

Object Knowledge ◽

The Real ◽

Fair Comparison ◽

Grasp Quality

AbstractWe present a database and a software tool, VisGraB, for benchmarking of methods for vision-based grasping of unknown objects with no prior object knowledge. The benchmark is a combined real-world and simulated experimental setup. Stereo images of real scenes containing several objects in different configurations are included in the database. The user needs to provide a method for grasp generation based on the real visual input. The grasps are then planned, executed, and evaluated by the provided grasp simulator where several grasp-quality measures are used for evaluation. This setup has the advantage that a large number of grasps can be executed and evaluated while dealing with dynamics and the noise and uncertainty present in the real world images. VisGraB enables a fair comparison among different grasping methods. The user furthermore does not need to deal with robot hardware, focusing on the vision methods instead. As a baseline, benchmark results of our grasp strategy are included.

Download Full-text

Orthogonal Representations of Object Shape and Category in Deep Convolutional Neural Networks and Human Visual Cortex

10.1101/555193 ◽

2019 ◽

Author(s):

Astrid A. Zeman ◽

J. Brendan Ritchie ◽

Stefania Bracci ◽

Hans Op de Beeck

Keyword(s):

Neural Networks ◽

Visual Cortex ◽

Convolutional Neural Networks ◽

Network Performance ◽

Temporal Cortex ◽

Visual Object ◽

Visual Object Recognition ◽

Deep Convolutional Neural Networks ◽

Shape Information ◽

Category Information

AbstractDeep Convolutional Neural Networks (CNNs) are gaining traction as the benchmark model of visual object recognition, with performance now surpassing humans. While CNNs can accurately assign one image to potentially thousands of categories, network performance could be the result of layers that are tuned to represent the visual shape of objects, rather than object category, since both are often confounded in natural images. Using two stimulus sets that explicitly dissociate shape from category, we correlate these two types of information with each layer of multiple CNNs. We also compare CNN output with fMRI activation along the human visual ventral stream by correlating artificial with biological representations. We find that CNNs encode category information independently from shape, peaking at the final fully connected layer in all tested CNN architectures. Comparing CNNs with fMRI brain data, early visual cortex (V1) and early layers of CNNs encode shape information. Anterior ventral temporal cortex encodes category information, which correlates best with the final layer of CNNs. The interaction between shape and category that is found along the human visual ventral pathway is echoed in multiple deep networks. Our results suggest CNNs represent category information independently from shape, much like the human visual system.

Download Full-text

Mechanisms of Feature Selectivity and Invariance in Primary Visual Cortex

10.1101/2020.02.08.940270 ◽

2020 ◽

Author(s):

Ali Almasi ◽

Hamish Meffin ◽

Shaun L. Cloherty ◽

Yan Wong ◽

Molis Yunzab ◽

...

Keyword(s):

Visual Cortex ◽

Primary Visual Cortex ◽

Visual Processing ◽

Object Identification ◽

Visual Object ◽

Object Processing ◽

Complex Cells ◽

Feature Selectivity ◽

Visual Object Identification ◽

Neuronal Computation

AbstractVisual object identification requires both selectivity for specific visual features that are important to the object’s identity and invariance to feature manipulations. For example, a hand can be shifted in position, rotated, or contracted but still be recognised as a hand. How are the competing requirements of selectivity and invariance built into the early stages of visual processing? Typically, cells in the primary visual cortex are classified as either simple or complex. They both show selectivity for edge-orientation but complex cells develop invariance to edge position within the receptive field (spatial phase). Using a data-driven model that extracts the spatial structures and nonlinearities associated with neuronal computation, we show that the balance between selectivity and invariance in complex cells is more diverse than thought. Phase invariance is frequently partial, thus retaining sensitivity to brightness polarity, while invariance to orientation and spatial frequency are more extensive than expected. The invariance arises due to two independent factors: (1) the structure and number of filters and (2) the form of nonlinearities that act upon the filter outputs. Both vary more than previously considered, so primary visual cortex forms an elaborate set of generic feature sensitivities, providing the foundation for more sophisticated object processing.

Download Full-text

A performance-optimized model of neural responses across the ventral visual stream

10.1101/036475 ◽

2016 ◽

Cited By ~ 8

Author(s):

Darren Seibert ◽

Daniel L Yamins ◽

Diego Ardila ◽

Ha Hong ◽

James J DiCarlo ◽

...

Keyword(s):

Neural Network ◽

Visual Cortex ◽

Object Recognition ◽

Recognition Performance ◽

Ventral Stream ◽

Visual Object ◽

Emergent Properties ◽

Visual Stream ◽

Response Properties ◽

High Level

Human visual object recognition is subserved by a multitude of cortical areas. To make sense of this system, one line of research focused on response properties of primary visual cortex neurons and developed theoretical models of a set of canonical computations such as convolution, thresholding, exponentiating and normalization that could be hierarchically repeated to give rise to more complex representations. Another line or research focused on response properties of high-level visual cortex and linked these to semantic categories useful for object recognition. Here, we hypothesized that the panoply of visual representations in the human ventral stream may be understood as emergent properties of a system constrained both by simple canonical computations and by top-level, object recognition functionality in a single unified framework (Yamins et al., 2014; Khaligh-Razavi and Kriegeskorte, 2014; Guclu and van Gerven, 2015). We built a deep convolutional neural network model optimized for object recognition and compared representations at various model levels using representational similarity analysis to human functional imaging responses elicited from viewing hundreds of image stimuli. Neural network layers developed representations that corresponded in a hierarchical consistent fashion to visual areas from V1 to LOC. This correspondence increased with optimization of the model's recognition performance. These findings support a unified view of the ventral stream in which representations from the earliest to the latest stages can be understood as being built from basic computations inspired by modeling of early visual cortex shaped by optimization for high-level object-based performance constraints.

Download Full-text

Mechanisms of Feature Selectivity and Invariance in Primary Visual Cortex

Cerebral Cortex ◽

10.1093/cercor/bhaa102 ◽

2020 ◽

Vol 30 (9) ◽

pp. 5067-5087

Author(s):

Ali Almasi ◽

Hamish Meffin ◽

Shaun L Cloherty ◽

Yan Wong ◽

Molis Yunzab ◽

...

Keyword(s):

Visual Cortex ◽

Primary Visual Cortex ◽

Visual Processing ◽

Object Identification ◽

Visual Object ◽

Object Processing ◽

Complex Cells ◽

Feature Selectivity ◽

Visual Object Identification ◽

Neuronal Computation

Abstract Visual object identification requires both selectivity for specific visual features that are important to the object’s identity and invariance to feature manipulations. For example, a hand can be shifted in position, rotated, or contracted but still be recognized as a hand. How are the competing requirements of selectivity and invariance built into the early stages of visual processing? Typically, cells in the primary visual cortex are classified as either simple or complex. They both show selectivity for edge-orientation but complex cells develop invariance to edge position within the receptive field (spatial phase). Using a data-driven model that extracts the spatial structures and nonlinearities associated with neuronal computation, we quantitatively describe the balance between selectivity and invariance in complex cells. Phase invariance is frequently partial, while invariance to orientation and spatial frequency are more extensive than expected. The invariance arises due to two independent factors: (1) the structure and number of filters and (2) the form of nonlinearities that act upon the filter outputs. Both vary more than previously considered, so primary visual cortex forms an elaborate set of generic feature sensitivities, providing the foundation for more sophisticated object processing.

Download Full-text

Minimal Recognizable Configurations Elicit Category-selective Responses in Higher Order Visual Cortex

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01420 ◽

2019 ◽

Vol 31 (9) ◽

pp. 1354-1367

Author(s):

Yael Holzinger ◽

Shimon Ullman ◽

Daniel Harari ◽

Marlene Behrmann ◽

Galia Avidan

Keyword(s):

Visual Cortex ◽

Object Recognition ◽

Visual Recognition ◽

Recognition Performance ◽

Occipital Cortex ◽

Building Blocks ◽

Visual Object ◽

Visual Object Recognition ◽

Face Area ◽

Early Visual Cortex

Visual object recognition is performed effortlessly by humans notwithstanding the fact that it requires a series of complex computations, which are, as yet, not well understood. Here, we tested a novel account of the representations used for visual recognition and their neural correlates using fMRI. The rationale is based on previous research showing that a set of representations, termed “minimal recognizable configurations” (MIRCs), which are computationally derived and have unique psychophysical characteristics, serve as the building blocks of object recognition. We contrasted the BOLD responses elicited by MIRC images, derived from different categories (faces, objects, and places), sub-MIRCs, which are visually similar to MIRCs, but, instead, result in poor recognition and scrambled, unrecognizable images. Stimuli were presented in blocks, and participants indicated yes/no recognition for each image. We confirmed that MIRCs elicited higher recognition performance compared to sub-MIRCs for all three categories. Whereas fMRI activation in early visual cortex for both MIRCs and sub-MIRCs of each category did not differ from that elicited by scrambled images, high-level visual regions exhibited overall greater activation for MIRCs compared to sub-MIRCs or scrambled images. Moreover, MIRCs and sub-MIRCs from each category elicited enhanced activation in corresponding category-selective regions including fusiform face area and occipital face area (faces), lateral occipital cortex (objects), and parahippocampal place area and transverse occipital sulcus (places). These findings reveal the psychological and neural relevance of MIRCs and enable us to make progress in developing a more complete account of object recognition.

Download Full-text