scholarly journals A Unified Theory Of Early Visual Representations From Retina To Cortex Through Anatomically Constrained Deep CNNs

2019 ◽  
Author(s):  
Jack Lindsey ◽  
Samuel A. Ocko ◽  
Surya Ganguli ◽  
Stephane Deny

AbstractThe vertebrate visual system is hierarchically organized to process visual information in successive stages. Neural representations vary drastically across the first stages of visual processing: at the output of the retina, ganglion cell receptive fields (RFs) exhibit a clear antagonistic center-surround structure, whereas in the primary visual cortex (V1), typical RFs are sharply tuned to a precise orientation. There is currently no unified theory explaining these differences in representations across layers. Here, using a deep convolutional neural network trained on image recognition as a model of the visual system, we show that such differences in representation can emerge as a direct consequence of different neural resource constraints on the retinal and cortical networks, and for the first time we find a single model from which both geometries spontaneously emerge at the appropriate stages of visual processing. The key constraint is a reduced number of neurons at the retinal output, consistent with the anatomy of the optic nerve as a stringent bottleneck. Second, we find that, for simple downstream cortical networks, visual representations at the retinal output emerge as nonlinear and lossy feature detectors, whereas they emerge as linear and faithful encoders of the visual scene for more complex cortical networks. This result predicts that the retinas of small vertebrates (e.g. salamander, frog) should perform sophisticated nonlinear computations, extracting features directly relevant to behavior, whereas retinas of large animals such as primates should mostly encode the visual scene linearly and respond to a much broader range of stimuli. These predictions could reconcile the two seemingly incompatible views of the retina as either performing feature extraction or efficient coding of natural scenes, by suggesting that all vertebrates lie on a spectrum between these two objectives, depending on the degree of neural resources allocated to their visual system.

Author(s):  
Mark Edwards ◽  
Stephanie C. Goodhew ◽  
David R. Badcock

AbstractThe visual system uses parallel pathways to process information. However, an ongoing debate centers on the extent to which the pathways from the retina, via the Lateral Geniculate nucleus to the visual cortex, process distinct aspects of the visual scene and, if they do, can stimuli in the laboratory be used to selectively drive them. These questions are important for a number of reasons, including that some pathologies are thought to be associated with impaired functioning of one of these pathways and certain cognitive functions have been preferentially linked to specific pathways. Here we examine the two main pathways that have been the focus of this debate: the magnocellular and parvocellular pathways. Specifically, we review the results of electrophysiological and lesion studies that have investigated their properties and conclude that while there is substantial overlap in the type of information that they process, it is possible to identify aspects of visual information that are predominantly processed by either the magnocellular or parvocellular pathway. We then discuss the types of visual stimuli that can be used to preferentially drive these pathways.


Author(s):  
Angie M. Michaiel ◽  
Elliott T.T. Abe ◽  
Cristopher M. Niell

ABSTRACTMany studies of visual processing are conducted in unnatural conditions, such as head- and gaze-fixation. As this radically limits natural exploration of the visual environment, there is much less known about how animals actively use their sensory systems to acquire visual information in natural, goal-directed contexts. Recently, prey capture has emerged as an ethologically relevant behavior that mice perform without training, and that engages vision for accurate orienting and pursuit. However, it is unclear how mice target their gaze during such natural behaviors, particularly since, in contrast to many predatory species, mice have a narrow binocular field and lack foveate vision that would entail fixing their gaze on a specific point in the visual field. Here we measured head and bilateral eye movements in freely moving mice performing prey capture. We find that the majority of eye movements are compensatory for head movements, thereby acting to stabilize the visual scene. During head turns, however, these periods of stabilization are interspersed with non-compensatory saccades that abruptly shift gaze position. Analysis of eye movements relative to the cricket position shows that the saccades do not preferentially select a specific point in the visual scene. Rather, orienting movements are driven by the head, with the eyes following in coordination to sequentially stabilize and recenter the gaze. These findings help relate eye movements in the mouse to other species, and provide a foundation for studying active vision during ethological behaviors in the mouse.


2020 ◽  
Author(s):  
Karola Schlegelmilch ◽  
Annie E. Wertz

Visual processing of a natural environment occurs quickly and effortlessly. Yet, little is known about how young children are able to visually categorize naturalistic structures, since their perceptual abilities are still developing. We addressed this question by asking 76 children (age: 4.1-6.1 years) and 72 adults (age: 18-50 years) to first sort cards with greyscale images depicting vegetation, manmade artifacts, and non-living natural elements (e.g., stones) into groups according to visual similarity. Then, they were asked to choose the images' superordinate categories. We analyzed the relevance of different visual properties to the decisions of the participant groups. Children were very well able to interpret complex visual structures. However, children relied on fewer visual properties and, in general, were less likely to include properties which afforded the analysis of detailed visual information in their categorization decisions than adults, suggesting that immaturities of the still-developing visual system affected categorization. Moreover, when sorting according to visual similarity, both groups attended to the images' assumed superordinate categories—in particular to vegetation—in addition to visual properties. Children had a higher relative sensitivity for vegetation than adults did in the classification task when controlling for overall performance differences. Taken together, these findings add to the sparse literature on the role of developing perceptual abilities in processing naturalistic visual input.


2010 ◽  
Vol 10 (04) ◽  
pp. 513-529
Author(s):  
BARTHÉLÉMY DURETTE ◽  
JEANNY HÉRAULT ◽  
DAVID ALLEYSSON

To extract high-level information from natural scenes, the visual system has to cope with a wide variety of ambient lights, reflection properties of objects, spatio-temporal contexts, and geometrical complexity. By pre-processing the visual information, the retina plays a key role in the functioning of the whole visual system. It is crucial to reproduce such a pre-processing in artificial devices aiming at replacing or substituting the damaged vision system by artificial means. In this paper, we present a biologically plausible model of the retina at the cell level and its implementation as a real-time retinal simulation software. It features the non-uniform sampling of the visual information by the photoreceptor cells, the non-separable spatio-temporal properties of the retina, the subsequent generation of the Parvocellular and Magnocellular pathways, and the non-linear equalization of luminance and contrast at the local level. For each of these aspects, a description of the model is provided and illustrated. Their respective interest for the replacement or substitution of vision is discussed.


2018 ◽  
Vol 4 (1) ◽  
pp. 311-336 ◽  
Author(s):  
Yaoda Xu

Visual information processing contains two opposite needs. There is both a need to comprehend the richness of the visual world and a need to extract only pertinent visual information to guide thoughts and behavior at a given moment. I argue that these two aspects of visual processing are mediated by two complementary visual systems in the primate brain—specifically, the occipitotemporal cortex (OTC) and the posterior parietal cortex (PPC). The role of OTC in visual processing has been documented extensively by decades of neuroscience research. I review here recent evidence from human imaging and monkey neurophysiology studies to highlight the role of PPC in adaptive visual processing. I first document the diverse array of visual representations found in PPC. I then describe the adaptive nature of visual representation in PPC by contrasting visual processing in OTC and PPC and by showing that visual representations in PPC largely originate from OTC.


2018 ◽  
Author(s):  
Xu Han ◽  
Ben Vermaercke ◽  
Vincent Bonin

AbstractVisual processing and behavior depend on specialized neural representations and information channels that encode distinct visual information and enable distinct computations. Our understanding of the neural substrate, however, remain severely limited by sparse recordings and the restricted range of visual areas and visual stimuli considered. We characterized in the mouse the multidimensional spatiotemporal tuning properties of > 30,000 layer 2/3 pyramidal neurons across seven areas of the cortex. The dataset reveals population specialized for processing of oriented and non-oriented contrast, spatiotemporal frequency, and motion speed. Areal analysis reveals profound functional diversity and specificity as well as highly specific representations of visual processing channels in distinct visual areas. Clustering analysis shows a branching of visual representations along the posterior to anterior axis, and between lateral and dorsal areas. Overall, this dataset provides a cellular-resolution atlas for understanding organizing principles underlying sensory representations across the cortex.SummaryVisual representations and visual channels are the cornerstones of mammalian visual processing and critical for a range of life sustaining behaviors. However, the lack of data sets spanning multiple visual areas preclude unambiguous identification of visual processing streams and the sparse, singular recording data sets obtained thus far are insufficient to reveal the functional diversity of visual areas and to study visual information channels. We characterized the tunings of over 30,000 cortical excitatory neurons from 7 visual areas to a broad array of stimuli and studied their responses in terms of their ability to encode orientation, spatiotemporal contrast and visual motion speed. We found all mouse visual cortical areas convey diverse information but show distinct biases in terms of numbers of neurons tuned to particular spatiotemporal features. Neurons in visual areas differ in their spatiotemporal tuning but also in their relative response to oriented and unoriented contrast. We uncovered a population that preferentially responds to unoriented contrast and shows only weak responses to oriented stimuli. This population is strongly overrepresented in certain areas (V1, LM and LI) and underrepresented in others (AL, RL, AM, and PM). Spatiotemporal tunings are broadly distributed in all visual areas indicating that all areas have access to broad spatiotemporal information. However, individual areas show specific biases. While V1 is heavily biased in favor of low spatial and temporal frequencies, area LM responds more strongly to mid-range frequencies. Areas PM and LI are biased in favor of slowly-varying high-resolution signals. By comparison, anterior areas AL, RL and AM are heavily biased in favor of fast-varying, low to mid spatial frequency signals. Critically, theses biases express themselves in vastly different number of cells tuned to particular features, suggesting differential sampling of visual processing channels across areas. Comparing across areas, we found divergent visual representations between anterior and posterior areas, and between lateral and dorsal areas, suggesting the segregated organization of cortical streams for distinct information processing.


2018 ◽  
Author(s):  
Samuel A. Ocko ◽  
Jack Lindsey ◽  
Surya Ganguli ◽  
Stephane Deny

AbstractOne of the most striking aspects of early visual processing in the retina is the immediate parcellation of visual information into multiple parallel pathways, formed by different retinal ganglion cell types each tiling the entire visual field. Existing theories of efficient coding have been unable to account for the functional advantages of such cell-type diversity in encoding natural scenes. Here we go beyond previous theories to analyze how a simple linear retinal encoding model with different convolutional cell types efficiently encodes naturalistic spatiotemporal movies given a fixed firing rate budget. We find that optimizing the receptive fields and cell densities of two cell types makes them match the properties of the two main cell types in the primate retina, midget and parasol cells, in terms of spatial and temporal sensitivity, cell spacing, and their relative ratio. Moreover, our theory gives a precise account of how the ratio of midget to parasol cells decreases with retinal eccentricity. Also, we train a nonlinear encoding model with a rectifying nonlinearity to efficiently encode naturalistic movies, and again find emergent receptive fields resembling those of midget and parasol cells that are now further subdivided into ON and OFF types. Thus our work provides a theoretical justification, based on the efficient coding of natural movies, for the existence of the four most dominant cell types in the primate retina that together comprise 70% of all ganglion cells.


Author(s):  
Yaoda Xu ◽  
Maryam Vaziri-Pashkam

ABSTRACTConvolutional neural networks (CNNs) have achieved very high object categorization performance recently. It has increasingly become a common practice in human fMRI research to regard CNNs as working model of the human visual system. Here we reevaluate this approach by comparing fMRI responses from the human brain in three experiments with those from 14 different CNNs. Our visual stimuli included original and filtered versions of real-world object images and images of artificial objects. Replicating previous findings, we found a brain-CNN correspondence in a number of CNNs with lower and higher levels of visual representations in the human brain better resembling those of lower and higher CNN layers, respectively. Moreover, the lower layers of some CNNs could fully capture the representational structure of human early visual areas for both the original and filtered real-world object images. Despite these successes, no CNN examined could fully capture the representational structure of higher human visual processing areas. They also failed to capture that of artificial object images in all levels of visual processing. The latter is particularly troublesome, as decades of vision research has demonstrated that the same algorithms used in the processing of natural images would support the processing of artificial visual stimuli in the primate brain. Similar results were obtained when a CNN was trained with stylized object images that emphasized shape representation. CNNs likely represent visual information in fundamentally different ways from the human brain. Current CNNs thus may not serve as sound working models of the human visual system.Significance StatementRecent CNNs have achieved very high object categorization performance, with some even exceeding human performance. It has become common practice in recent neuroscience research to regard CNNs as working models of the human visual system. Here we evaluate this approach by comparing fMRI responses from the human brain with those from 14 different CNNs. Despite CNNs’ ability to successfully perform visual object categorization like the human visual system, they appear to represent visual information in fundamentally different ways from the human brain. Current CNNs thus may not serve as sound working models of the human visual system. Given the current dominating trend of incorporating CNN modeling in visual neuroscience research, our results question the validity of such an approach.


Perception ◽  
1997 ◽  
Vol 26 (9) ◽  
pp. 1089-1100 ◽  
Author(s):  
Nuala Brady

In natural scenes and other broadband images, spatial variations in luminance occur at a range of scales or frequencies. It is generally agreed that the visual image is initially represented by the activity of separate frequency-tuned channels, and this notion is supported by physiological evidence for a stage of multi-resolution filtering in early visual processing. The question whether these channels can be accessed as independent sources of information in the normal course of events is a more contentious one. In the psychophysical study of both motion and spatial vision, there are examples of tasks in which fine-scale structure dominates perception or performance and obscures information at coarser scales. It is argued here that one important factor determining the relative salience of information from different spatial scales in broadband images is the distribution of response activity across spatial channels. The special case of natural scenes that have characteristic ‘scale-invariant’ power spectra in which image contrast is roughly constant in equal octave frequency bands is considered. A review is presented of evidence which suggests that the sensitivity of frequency-tuned filters in the visual system is matched to this image statistic, so that, on average, different channels respond with equal activity to natural scenes. Under these conditions, the visual system does appear to have independent access to information at different spatial scales and spatial scale interactions are not apparent.


2021 ◽  
Author(s):  
Bruno Richard ◽  
Patrick Shafto

Scenes contain many statistical regularities that, if accounted for by the visual system, could greatly benefit visual processing. One such statistic to consider is the orientation-averaged slope (α) of the amplitude spectrum of natural scenes. Human observers are differently sensitive to αs, and they may utilize this statistic when processing natural scenes. Here, we explore whether discrimination sensitivity to α is associated with the recently viewed environment. Observers were immersed, using a Head-Mounted Display, in an environment that was either unaltered or had its average α steepened or shallowed. Discrimination thresholds were affected by the average shift in α: a steeper environment decreased thresholds for very steep reference αs while a shallower environment decreased thresholds for shallow values. We modelled these data with a Bayesian observer model and explored how different prior shapes may influence the ability of the model to fit observer thresholds. We explore three potential prior shapes: unimodal, bimodal and trimodal modified-PERT distributions and found the bimodal prior to best-capture observer thresholds for all experimental conditions. Notably, the prior modes' position was shifted following adaptation, which suggests that a priori expectations for α are sufficiently malleable to account for changes in the average α of the recently viewed scenes.


Sign in / Sign up

Export Citation Format

Share Document