A Hierarchical Dynamic Link Network to Solve the Visual Correspondence Problem

R P Würtz; C von der Malsburg

doi:10.1068/v96l0702

A Hierarchical Dynamic Link Network to Solve the Visual Correspondence Problem

Perception ◽

10.1068/v96l0702 ◽

1996 ◽

Vol 25 (1_suppl) ◽

pp. 183-183

Author(s):

R P Würtz ◽

C von der Malsburg

Keyword(s):

Object Recognition ◽

Temporal Dynamics ◽

Image Point ◽

Visual Object ◽

Correspondence Problem ◽

Visual Object Recognition ◽

Feature Similarity ◽

Ordered State ◽

Dynamic Links ◽

Dynamic Link

Conventional neural networks try to solve the problem of object recognition in a single step by building a stimulus — response system that codes its result as cell activities. We take a different approach assuming that recognition is an active process with temporal dynamics and results in an ordered state. We present a structure of neuronal layers, interconnected by dynamic links (von der Malsburg, 1985 Berichte der Bunsengesellschaft für Physikalische Chemie89 703 – 710) that solves the correspondence problem between two images and thus constitutes an important building block for a model of recognition. Images as well as stored models are represented as Gabor pyramids. This allows the dynamics to proceed from coarse to fine scale and reduces the sequential processing time inherent in the concept. Invariance under background changes is also made possible. On the lowest frequency level, a single blob of activity moves across the image and model layer, respectively. Dynamic links between these layers are initialised to the (highly ambiguous) feature similarities. Links grow or decline according to a combination of feature similarity and correlated activation. This enforces correct neighbourhood relationships in addition to feature similarity. On the higher levels the established correspondences are refined by several blobs in parallel. We present an improved version of the dynamical system proposed by Würtz [1995 Multilayer Dynamic Link Networks for Establishing Image Point Correspondences and Visual Object Recognition (Thun, Frankfurt a.M.: Harri Deutsch)] and show, with examples of human faces, that it evolves from an unordered link distribution to any ordered state where only corresponding point pairs are connected by strong links. Correspondences between sample points are population-coded by a set of neighbouring links.

Download Full-text

Neural Basis of Semantically Dependent and Independent Cross-Modal Boosts on the Attentional Blink

Cerebral Cortex ◽

10.1093/cercor/bhaa362 ◽

2020 ◽

Author(s):

Song Zhao ◽

Chengzhi Feng ◽

Xinyin Huang ◽

Yijun Wang ◽

Wenfeng Feng

Keyword(s):

Object Recognition ◽

Attentional Blink ◽

Temporal Dynamics ◽

Recognition Task ◽

Event Related Potentials ◽

Visual Object ◽

Visual Object Recognition ◽

Neural Basis ◽

Related Potentials ◽

The Cross

Abstract The present study recorded event-related potentials (ERPs) in a visual object-recognition task under the attentional blink paradigm to explore the temporal dynamics of the cross-modal boost on attentional blink and whether this auditory benefit would be modulated by semantic congruency between T2 and the simultaneous sound. Behaviorally, the present study showed that not only a semantically congruent but also a semantically incongruent sound improved T2 discrimination during the attentional blink interval, whereas the enhancement was larger for the congruent sound. The ERP results revealed that the behavioral improvements induced by both the semantically congruent and incongruent sounds were closely associated with an early cross-modal interaction on the occipital N195 (192–228 ms). In contrast, the lower T2 accuracy for the incongruent than congruent condition was accompanied by a larger late occurring cento-parietal N440 (424–448 ms). These findings suggest that the cross-modal boost on attentional blink is hierarchical: the task-irrelevant but simultaneous sound, irrespective of its semantic relevance, firstly enables T2 to escape the attentional blink via cross-modally strengthening the early stage of visual object-recognition processing, whereas the semantic conflict of the sound begins to interfere with visual awareness only at a later stage when the representation of visual object is extracted.

Download Full-text

The Evolution of Meaning: Spatio-temporal Dynamics of Visual Object Recognition

Journal of Cognitive Neuroscience ◽

10.1162/jocn.2010.21544 ◽

2011 ◽

Vol 23 (8) ◽

pp. 1887-1899 ◽

Cited By ~ 62

Author(s):

Alex Clarke ◽

Kirsten I. Taylor ◽

Lorraine K. Tyler

Keyword(s):

Object Recognition ◽

Semantic Information ◽

Phase Synchronization ◽

Temporal Dynamics ◽

Semantic Integration ◽

Visual Object ◽

Visual Object Recognition ◽

Object Processing ◽

Spatio Temporal ◽

Time Courses

Research on the spatio-temporal dynamics of visual object recognition suggests a recurrent, interactive model whereby an initial feedforward sweep through the ventral stream to prefrontal cortex is followed by recurrent interactions. However, critical questions remain regarding the factors that mediate the degree of recurrent interactions necessary for meaningful object recognition. The novel prediction we test here is that recurrent interactivity is driven by increasing semantic integration demands as defined by the complexity of semantic information required by the task and driven by the stimuli. To test this prediction, we recorded magnetoencephalography data while participants named living and nonliving objects during two naming tasks. We found that the spatio-temporal dynamics of neural activity were modulated by the level of semantic integration required. Specifically, source reconstructed time courses and phase synchronization measures showed increased recurrent interactions as a function of semantic integration demands. These findings demonstrate that the cortical dynamics of object processing are modulated by the complexity of semantic information required from the visual input.

Download Full-text

Similarity-based fusion of MEG and fMRI reveals spatio-temporal dynamics in human cortex during visual object recognition

10.1101/032656 ◽

2015 ◽

Cited By ~ 5

Author(s):

Radoslaw Cichy ◽

Dimitrios Pantazis ◽

Aude Oliva

Keyword(s):

Object Recognition ◽

Temporal Dynamics ◽

Imaging Techniques ◽

Visual Object ◽

Data Sets ◽

Visual Object Recognition ◽

Occipital Pole ◽

Spatio Temporal ◽

The Brain

Every human cognitive function, such as visual object recognition, is realized in a complex spatio-temporal activity pattern in the brain. Current brain imaging techniques in isolation cannot resolve the brain's spatio-temporal dynamics because they provide either high spatial or temporal resolution but not both. To overcome this limitation, we developed a new integration approach that uses representational similarities to combine measurements from different imaging modalities - magnetoencephalography (MEG) and functional MRI (fMRI) - to yield a spatially and temporally integrated characterization of neuronal activation. Applying this approach to two independent MEG-fMRI data sets, we observed that neural activity first emerged in the occipital pole at 50-80ms, before spreading rapidly and progressively in the anterior direction along the ventral and dorsal visual streams. These results provide a novel and comprehensive, spatio-temporally resolved view of the rapid neural dynamics during the first few hundred milliseconds of object vision. They further demonstrate the feasibility of spatially unbiased representational similarity based fusion of MEG and fMRI, promising new insights into how the brain computes complex cognitive functions.

Download Full-text

Faculty Opinions recommendation of Similarity-Based Fusion of MEG and fMRI Reveals Spatio-Temporal Dynamics in Human Cortex During Visual Object Recognition.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726386195.793524248 ◽

2016 ◽

Author(s):

Krish Sathian

Keyword(s):

Object Recognition ◽

Temporal Dynamics ◽

Visual Object ◽

Visual Object Recognition ◽

Human Cortex ◽

Spatio Temporal

Download Full-text

Similarity-Based Fusion of MEG and fMRI Reveals Spatio-Temporal Dynamics in Human Cortex During Visual Object Recognition

Cerebral Cortex ◽

10.1093/cercor/bhw135 ◽

2016 ◽

Vol 26 (8) ◽

pp. 3563-3579 ◽

Cited By ~ 74

Author(s):

Radoslaw Martin Cichy ◽

Dimitrios Pantazis ◽

Aude Oliva

Keyword(s):

Object Recognition ◽

Temporal Dynamics ◽

Visual Object ◽

Visual Object Recognition ◽

Human Cortex ◽

Spatio Temporal

Download Full-text

Characterizing the temporal dynamics of object recognition by deep neural networks : role of depth

10.1101/178541 ◽

2017 ◽

Cited By ~ 1

Author(s):

Kandan Ramakrishnan ◽

Iris I.A. Groen ◽

Arnold W.M. Smeulders ◽

H. Steven Scholte ◽

Sennay Ghebreab

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Visual Processing ◽

Temporal Dynamics ◽

Occipital Cortex ◽

Stimulus Onset ◽

Human Vision ◽

Visual Object ◽

Visual Object Recognition ◽

Brain Responses

AbstractConvolutional neural networks (CNNs) have recently emerged as promising models of human vision based on their ability to predict hemodynamic brain responses to visual stimuli measured with functional magnetic resonance imaging (fMRI). However, the degree to which CNNs can predict temporal dynamics of visual object recognition reflected in neural measures with millisecond precision is less understood. Additionally, while deeper CNNs with higher numbers of layers perform better on automated object recognition, it is unclear if this also results into better correlation to brain responses. Here, we examined 1) to what extent CNN layers predict visual evoked responses in the human brain over time and 2) whether deeper CNNs better model brain responses. Specifically, we tested how well CNN architectures with 7 (CNN-7) and 15 (CNN-15) layers predicted electro-encephalography (EEG) responses to several thousands of natural images. Our results show that both CNN architectures correspond to EEG responses in a hierarchical spatio-temporal manner, with lower layers explaining responses early in time at electrodes overlying early visual cortex, and higher layers explaining responses later in time at electrodes overlying lateral-occipital cortex. While the explained variance of neural responses by individual layers did not differ between CNN-7 and CNN-15, combining the representations across layers resulted in improved performance of CNN-15 compared to CNN-7, but only after 150 ms after stimulus-onset. This suggests that CNN representations reflect both early (feed-forward) and late (feedback) stages of visual processing. Overall, our results show that depth of CNNs indeed plays a role in explaining time-resolved EEG responses.

Download Full-text