Spatial Scale Interactions and Image Statistics

Perception ◽  
1997 ◽  
Vol 26 (9) ◽  
pp. 1089-1100 ◽  
Author(s):  
Nuala Brady

In natural scenes and other broadband images, spatial variations in luminance occur at a range of scales or frequencies. It is generally agreed that the visual image is initially represented by the activity of separate frequency-tuned channels, and this notion is supported by physiological evidence for a stage of multi-resolution filtering in early visual processing. The question whether these channels can be accessed as independent sources of information in the normal course of events is a more contentious one. In the psychophysical study of both motion and spatial vision, there are examples of tasks in which fine-scale structure dominates perception or performance and obscures information at coarser scales. It is argued here that one important factor determining the relative salience of information from different spatial scales in broadband images is the distribution of response activity across spatial channels. The special case of natural scenes that have characteristic ‘scale-invariant’ power spectra in which image contrast is roughly constant in equal octave frequency bands is considered. A review is presented of evidence which suggests that the sensitivity of frequency-tuned filters in the visual system is matched to this image statistic, so that, on average, different channels respond with equal activity to natural scenes. Under these conditions, the visual system does appear to have independent access to information at different spatial scales and spatial scale interactions are not apparent.

1994 ◽  
Vol 5 (4) ◽  
pp. 195-200 ◽  
Author(s):  
Philippe G. Schyns ◽  
Aude Oliva

In very fast recognition tasks, scenes are identified as fast as isolated objects How can this efficiency be achieved, considering the large number of component objects and interfering factors, such as cast shadows and occlusions? Scene categories tend to have distinct and typical spatial organizations of their major components If human perceptual structures were tuned to extract this information early in processing, a coarse-to-fine process could account for efficient scene recognition A coarse description of the input scene (oriented “blobs” in a particular spatial organization) would initiate recognition before the identity of the objects is processed We report two experiments that contrast the respective roles of coarse and fine information in fast identification of natural scenes The first experiment investigated whether coarse and fine information were used at different stages of processing The second experiment tested whether coarse-to-fine processing accounts for fast scene categorization The data suggest that recognition occurs at both coarse and fine spatial scales By attending first to the coarse scale, the visual system can get a quick and rough estimate of the input to activate scene schemas in memory, attending to fine information allows refinement, or refutation, of the raw estimate


Perception ◽  
1997 ◽  
Vol 26 (8) ◽  
pp. 961-976 ◽  
Author(s):  
Richard A Eagle

The aim of the experiments was to discover whether the visual system has independent access to motion information at different spatial scales when presented with a broadband stimulus. Subjects were required to discriminate between a pair of two-frame motion sequences, one containing a coherently displacing pattern and the other containing a pattern with high-frequency noise. The stimuli were either narrowband (1 octave) or broadband (6 octaves spanning 0.23–15.0 cycles deg−1) and their power spectra were either flat or followed a 1 /f2 function. For the broadband stimuli, noise was introduced cumulatively into increasingly lower frequencies. For the narrowband stimuli, noise was introduced into the same frequency band as the signal. All stimuli could be defined by the lowest noise frequency ( n1) they contained. For each stimulus, the largest spatial displacement across the two frames at which the task could be performed was measured ( dmax). For the narrowband stimuli, dmax increased as n1 was lowered. This was true over the entire frequency range for the 1 /f2 stimuli, though the task became impossible for the flat-spectrum stimuli at the lowest frequencies. This is attributed to the very low contrast of these latter stimuli. The dmax values for the broadband stimuli tended to shadow those of the narrowband stimuli with the equivalent values of n1 being around 25% lower. The data were modelled by spatiotemporally filtering the stimuli and considering the amount of directional power in the signal and noise sequences. The results suggest that there must be multiple spatial-frequency channels in operation, and that for broadband patterns the visual system has perceptual access to these individual channel outputs, utilising different filters depending on the task requirements.


Perception ◽  
1997 ◽  
Vol 26 (1_suppl) ◽  
pp. 24-24 ◽  
Author(s):  
J H van Hateren

The first steps of processing in the visual system of the blowfly are well suited for studying the relationship between the properties of the environment and the function of visual processing (eg Srinivasan et al, 1982 Proceedings of the Royal Society, London B216 427; van Hateren, 1992 Journal of Comparative Physiology A171 157). Although the early visual system appears to be linear to some extent, there are also reports on functionally significant nonlinearities (Laughlin, 1981 Zeitschrift für Naturforschung36c 910). Recent theories using information theory for understanding the early visual system perform reasonably well, but not quite as well as the real visual system when confronted with natural stimuli [eg van Hateren, 1992 Nature (London)360 68]. The main problem seems to be that they lack a component that adapts with the right time course to changes in stimulus statistics (eg the local average light intensity). In order to study this problem of adaptation with a relatively simple, yet realistic, stimulus I recorded time series of natural intensities, and played them back via a high-brightness LED to the visual system of the blowfly ( Calliphora vicina). The power spectra of the intensity measurements and photoreceptor responses behave approximately as 1/ f, with f the temporal frequency, whilst those of second-order neurons (LMCs) are almost flat. The probability distributions of the responses of LMCs are almost gaussian and largely independent of the input contrast, unlike the distributions of photoreceptor responses and intensity measurements. These results suggest that LMCs are in effect executing a form of contrast normalisation in the time domain.


2019 ◽  
Author(s):  
Jack Lindsey ◽  
Samuel A. Ocko ◽  
Surya Ganguli ◽  
Stephane Deny

AbstractThe vertebrate visual system is hierarchically organized to process visual information in successive stages. Neural representations vary drastically across the first stages of visual processing: at the output of the retina, ganglion cell receptive fields (RFs) exhibit a clear antagonistic center-surround structure, whereas in the primary visual cortex (V1), typical RFs are sharply tuned to a precise orientation. There is currently no unified theory explaining these differences in representations across layers. Here, using a deep convolutional neural network trained on image recognition as a model of the visual system, we show that such differences in representation can emerge as a direct consequence of different neural resource constraints on the retinal and cortical networks, and for the first time we find a single model from which both geometries spontaneously emerge at the appropriate stages of visual processing. The key constraint is a reduced number of neurons at the retinal output, consistent with the anatomy of the optic nerve as a stringent bottleneck. Second, we find that, for simple downstream cortical networks, visual representations at the retinal output emerge as nonlinear and lossy feature detectors, whereas they emerge as linear and faithful encoders of the visual scene for more complex cortical networks. This result predicts that the retinas of small vertebrates (e.g. salamander, frog) should perform sophisticated nonlinear computations, extracting features directly relevant to behavior, whereas retinas of large animals such as primates should mostly encode the visual scene linearly and respond to a much broader range of stimuli. These predictions could reconcile the two seemingly incompatible views of the retina as either performing feature extraction or efficient coding of natural scenes, by suggesting that all vertebrates lie on a spectrum between these two objectives, depending on the degree of neural resources allocated to their visual system.


Polar Record ◽  
2007 ◽  
Vol 43 (4) ◽  
pp. 353-359 ◽  
Author(s):  
Daniel Joly ◽  
Thierry Brossard

ABSTRACTThe climate and its components (temperature and precipitation) are organised according to different spatial scales that are structured hierarchically. The aim of this paper is to explore the dependence between temperature and deterministic factors at different scales on a 10 km2 study area on the northwestern coast of Svalbard. A GIS was developed which contained three sources of information: temperature, remotely sensed imagery and digital elevation models (DEM), and derived raster data layers. The first layer, temperatures, was acquired at regularly observed temporal intervals from 53 stations. The second layer comprised remotely sensed images (aerial photography and SPOT imagery) and DEM data at 2 m and 20 m resolution, respectively. From these, a windowing procedure was applied to derive several spatial subsets of different spatial resolutions (6, 14, 30, 60, 140, and 300 m). The third layer comprised slope, aspect, and a theoretical solar radiation value derived from the DEM, and a vegetation index derived from the remotely sensed imagery. Linear regressions were then systematically conducted on the datasets, with temperature as the dependent variable, and each of the other data layers as the independent variables. By using graphical analysis, we link the correlation coefficients obtained for each factor, from the smallest spatial resolution (6 m) to the largest resolution (300 m). The results indicated that each explanatory variable and scale brings a specific contribution to changes in temperature. For example, the effect of elevation remains constant for all spatial resolutions, reflecting a quasi ‘non-scalar’ pattern of this variable. For other variables however, the effect of spatial scale can have a strong effect. In the case of solar radiation, a maximum of explanation was obtained for spatial resolutions of 14 m and 60 m; for vegetation index the optimum contribution was related to the 300 m resolution. Thus, different environment characteristics may have significant effects on changes in temperature when differences in spatial scale are taken into account.


1990 ◽  
Vol 240 (1298) ◽  
pp. 211-229 ◽  

Adopting principles learnt from insect vision we have constructed a model of a general-purpose front-end visual system for motion detection that is designed to operate in parallel along each photoreceptor axis with only local connections. The model is also designed to assist electrophysiological analysis of visual processing because it puts the response to a moving scene into sets of template responses similar to the distribution of activity among different neurons. An earlier template model divided the visual image into the fields of adjacent receptors, measured as intensity or receptor modulation at small increments of time. As soon as we used this model with natural scenes, however, we found that we had to look at changes in intensity, not intensity itself. Running the new model also generated new insights into the effects of very fast motion, of blurring the image, and the value of lateral inhibition. We also experimented with ways of measuring the angular velocity of the image moving across the eye. The camera eye is moved at a known speed and the range to objects is calculated from the angular velocity of contrasts moving across the receptor array. The original template model is modified so that contrast is saturated in a new representation of the original image data. This reduces the 8-bit grey-scale image to a log 2 3 = 1.6-bit image, which becomes the input to a look-up table of templates. The output consists of groups of responding templates in specific ratios that define the input features, and these ratios lead into types of invariance at a higher level of further logic. At any stage, there can be persistent parallel inputs from all earlier stages. This design would enable groups of templates to be tuned to different expected situations, such as different velocities, different directions and different types of edges.


2021 ◽  
Author(s):  
Bruno Richard ◽  
Patrick Shafto

Scenes contain many statistical regularities that, if accounted for by the visual system, could greatly benefit visual processing. One such statistic to consider is the orientation-averaged slope (α) of the amplitude spectrum of natural scenes. Human observers are differently sensitive to αs, and they may utilize this statistic when processing natural scenes. Here, we explore whether discrimination sensitivity to α is associated with the recently viewed environment. Observers were immersed, using a Head-Mounted Display, in an environment that was either unaltered or had its average α steepened or shallowed. Discrimination thresholds were affected by the average shift in α: a steeper environment decreased thresholds for very steep reference αs while a shallower environment decreased thresholds for shallow values. We modelled these data with a Bayesian observer model and explored how different prior shapes may influence the ability of the model to fit observer thresholds. We explore three potential prior shapes: unimodal, bimodal and trimodal modified-PERT distributions and found the bimodal prior to best-capture observer thresholds for all experimental conditions. Notably, the prior modes' position was shifted following adaptation, which suggests that a priori expectations for α are sufficiently malleable to account for changes in the average α of the recently viewed scenes.


2021 ◽  
Vol 13 (12) ◽  
pp. 2355
Author(s):  
Linglin Zeng ◽  
Yuchao Hu ◽  
Rui Wang ◽  
Xiang Zhang ◽  
Guozhang Peng ◽  
...  

Air temperature (Ta) is a required input in a wide range of applications, e.g., agriculture. Land Surface Temperature (LST) products from Moderate Resolution Imaging Spectroradiometer (MODIS) are widely used to estimate Ta. Previous studies of these products in Ta estimation, however, were generally applied in small areas and with a small number of meteorological stations. This study designed both temporal and spatial experiments to estimate 8-day and daily maximum and minimum Ta (Tmax and Tmin) on three spatial scales: climate zone, continental and global scales from 2009 to 2018, using the Random Forest (RF) method based on MODIS LST products and other auxiliary data. Factors contributing to the relation between LST and Ta were determined based on physical models and equations. Temporal and spatial experiments were defined by the rules of dividing the training and validation datasets for the RF method, in which the stations selected in the training dataset were all included or not in the validation dataset. The RF model was first trained and validated on each spatial scale, respectively. On a global scale, model accuracy with a determination coefficient (R2) > 0.96 and root mean square error (RMSE) < 1.96 °C and R2 > 0.95 and RMSE < 2.55 °C was achieved for 8-day and daily Ta estimations, respectively, in both temporal and spatial experiments. Then the model was trained and cross-validated on each spatial scale. The results showed that the data size and station distribution of the study area were the main factors influencing the model performance at different spatial scales. Finally, the spatial patterns of the model performance and variable importance were analyzed. Both daytime and nighttime LST had a significant contribution in the 8-day Tmax estimation on all the three spatial scales; while their contribution in daily Tmax estimation varied over different continents or climate zones. This study was expected to improve our understanding of Ta estimation in terms of accuracy variations and influencing variables on different spatial and temporal scales. The future work mainly includes identifying underlying mechanisms of estimation errors and the uncertainty sources of Ta estimation from a local to a global scale.


2008 ◽  
Vol 275 (1649) ◽  
pp. 2299-2308 ◽  
Author(s):  
M To ◽  
P.G Lovell ◽  
T Troscianko ◽  
D.J Tolhurst

Natural visual scenes are rich in information, and any neural system analysing them must piece together the many messages from large arrays of diverse feature detectors. It is known how threshold detection of compound visual stimuli (sinusoidal gratings) is determined by their components' thresholds. We investigate whether similar combination rules apply to the perception of the complex and suprathreshold visual elements in naturalistic visual images. Observers gave magnitude estimations (ratings) of the perceived differences between pairs of images made from photographs of natural scenes. Images in some pairs differed along one stimulus dimension such as object colour, location, size or blur. But, for other image pairs, there were composite differences along two dimensions (e.g. both colour and object-location might change). We examined whether the ratings for such composite pairs could be predicted from the two ratings for the respective pairs in which only one stimulus dimension had changed. We found a pooling relationship similar to that proposed for simple stimuli: Minkowski summation with exponent 2.84 yielded the best predictive power ( r =0.96), an exponent similar to that generally reported for compound grating detection. This suggests that theories based on detecting simple stimuli can encompass visual processing of complex, suprathreshold stimuli.


2009 ◽  
Vol 102 (6) ◽  
pp. 3469-3480 ◽  
Author(s):  
H. M. Van Ettinger-Veenstra ◽  
W. Huijbers ◽  
T. P. Gutteling ◽  
M. Vink ◽  
J. L. Kenemans ◽  
...  

It is well known that parts of a visual scene are prioritized for visual processing, depending on the current situation. How the CNS moves this focus of attention across the visual image is largely unknown, although there is substantial evidence that preparation of an action is a key factor. Our results support the view that direct corticocortical feedback connections from frontal oculomotor areas to the visual cortex are responsible for the coupling between eye movements and shifts of visuospatial attention. Functional magnetic resonance imaging (fMRI)–guided transcranial magnetic stimulation (TMS) was applied to the frontal eye fields (FEFs) and intraparietal sulcus (IPS). A single pulse was delivered 60, 30, or 0 ms before a discrimination target was presented at, or next to, the target of a saccade in preparation. Results showed that the known enhancement of discrimination performance specific to locations to which eye movements are being prepared was enhanced by early TMS on the FEF contralateral to eye movement direction, whereas TMS on the IPS resulted in a general performance increase. The current findings indicate that the FEF affects selective visual processing within the visual cortex itself through direct feedback projections.


Sign in / Sign up

Export Citation Format

Share Document