Auditory and visual scene analysis: an overview

Hirohito M. Kondo; Anouk M. van Loon; Jun-Ichiro Kawahara; Brian C. J. Moore

doi:10.1098/rstb.2016.0099

Auditory and visual scene analysis: an overview

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2016.0099 ◽

2017 ◽

Vol 372 (1714) ◽

pp. 20160099 ◽

Cited By ~ 12

Author(s):

Hirohito M. Kondo ◽

Anouk M. van Loon ◽

Jun-Ichiro Kawahara ◽

Brian C. J. Moore

Keyword(s):

Scene Perception ◽

Computational Modelling ◽

Visual Scene ◽

Scene Analysis ◽

Theoretical Approaches ◽

Visual Inputs ◽

Sensory Modalities ◽

Stimulus Features ◽

The Brain ◽

Selection Of

We perceive the world as stable and composed of discrete objects even though auditory and visual inputs are often ambiguous owing to spatial and temporal occluders and changes in the conditions of observation. This raises important questions regarding where and how ‘scene analysis’ is performed in the brain. Recent advances from both auditory and visual research suggest that the brain does not simply process the incoming scene properties. Rather, top-down processes such as attention, expectations and prior knowledge facilitate scene perception. Thus, scene analysis is linked not only with the extraction of stimulus features and formation and selection of perceptual objects, but also with selective attention, perceptual binding and awareness. This special issue covers novel advances in scene-analysis research obtained using a combination of psychophysics, computational modelling, neuroimaging and neurophysiology, and presents new empirical and theoretical approaches. For integrative understanding of scene analysis beyond and across sensory modalities, we provide a collection of 15 articles that enable comparison and integration of recent findings in auditory and visual scene analysis. This article is part of the themed issue ‘Auditory and visual scene analysis’.

Download Full-text

Is predictability salient? A study of attentional capture by auditory patterns

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2016.0105 ◽

2017 ◽

Vol 372 (1714) ◽

pp. 20160105 ◽

Cited By ~ 50

Author(s):

Rosy Southwell ◽

Anna Baumann ◽

Cécile Gal ◽

Nicolas Barascud ◽

Karl Friston ◽

...

Keyword(s):

Attentional Capture ◽

Predictive Coding ◽

Visual Scene ◽

Scene Analysis ◽

Neural Responses ◽

Eeg Data ◽

Visual Domain ◽

The Brain ◽

Repeating Patterns

In this series of behavioural and electroencephalography (EEG) experiments, we investigate the extent to which repeating patterns of sounds capture attention. Work in the visual domain has revealed attentional capture by statistically predictable stimuli, consistent with predictive coding accounts which suggest that attention is drawn to sensory regularities. Here, stimuli comprised rapid sequences of tone pips, arranged in regular (REG) or random (RAND) patterns. EEG data demonstrate that the brain rapidly recognizes predictable patterns manifested as a rapid increase in responses to REG relative to RAND sequences. This increase is reminiscent of the increase in gain on neural responses to attended stimuli often seen in the neuroimaging literature, and thus consistent with the hypothesis that predictable sequences draw attention. To study potential attentional capture by auditory regularities, we used REG and RAND sequences in two different behavioural tasks designed to reveal effects of attentional capture by regularity. Overall, the pattern of results suggests that regularity does not capture attention. This article is part of the themed issue ‘Auditory and visual scene analysis’.

Download Full-text

Weighted cue integration in the rodent head direction system

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2012.0512 ◽

2014 ◽

Vol 369 (1635) ◽

pp. 20120512 ◽

Cited By ~ 26

Author(s):

Rebecca Knight ◽

Caitlin E. Piette ◽

Hector Page ◽

Daniel Walters ◽

Elizabeth Marozzi ◽

...

Keyword(s):

Cue Integration ◽

Head Direction ◽

Learning Rules ◽

Visual Inputs ◽

Sensory Modalities ◽

Neural Substrate ◽

Spatial Redistribution ◽

Sensory Cue ◽

The Brain ◽

Dynamic Plasticity

How the brain combines information from different sensory modalities and of differing reliability is an important and still-unanswered question. Using the head direction (HD) system as a model, we explored the resolution of conflicts between landmarks and background cues. Sensory cue integration models predict averaging of the two cues, whereas attractor models predict capture of the signal by the dominant cue. We found that a visual landmark mostly captured the HD signal at low conflicts: however, there was an increasing propensity for the cells to integrate the cues thereafter. A large conflict presented to naive rats resulted in greater visual cue capture (less integration) than in experienced rats, revealing an effect of experience. We propose that weighted cue integration in HD cells arises from dynamic plasticity of the feed-forward inputs to the network, causing within-trial spatial redistribution of the visual inputs onto the ring. This suggests that an attractor network can implement decision processes about cue reliability using simple architecture and learning rules, thus providing a potential neural substrate for weighted cue integration.

Download Full-text

How is visual salience computed in the brain? Insights from behaviour, neurobiology and modelling

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2016.0113 ◽

2017 ◽

Vol 372 (1714) ◽

pp. 20160113 ◽

Cited By ~ 60

Author(s):

Richard Veale ◽

Ziad M. Hafed ◽

Masatoshi Yoshida

Keyword(s):

Eye Movements ◽

Large Scale ◽

Computational Modelling ◽

Visual Saliency ◽

Saliency Map ◽

Visual Scene ◽

Scene Analysis ◽

Inhibition Mechanism ◽

Visual Salience

Inherent in visual scene analysis is a bottleneck associated with the need to sequentially sample locations with foveating eye movements. The concept of a ‘saliency map’ topographically encoding stimulus conspicuity over the visual scene has proven to be an efficient predictor of eye movements. Our work reviews insights into the neurobiological implementation of visual salience computation. We start by summarizing the role that different visual brain areas play in salience computation, whether at the level of feature analysis for bottom-up salience or at the level of goal-directed priority maps for output behaviour. We then delve into how a subcortical structure, the superior colliculus (SC), participates in salience computation. The SC represents a visual saliency map via a centre-surround inhibition mechanism in the superficial layers, which feeds into priority selection mechanisms in the deeper layers, thereby affecting saccadic and microsaccadic eye movements. Lateral interactions in the local SC circuit are particularly important for controlling active populations of neurons. This, in turn, might help explain long-range effects, such as those of peripheral cues on tiny microsaccades. Finally, we show how a combination of in vitro neurophysiology and large-scale computational modelling is able to clarify how salience computation is implemented in the local circuit of the SC. This article is part of the themed issue ‘Auditory and visual scene analysis’.

Download Full-text

Crossmodal Links between Vision and Touch in Spatial Attention: A Computational Modelling Study

Computational Intelligence and Neuroscience ◽

10.1155/2010/304941 ◽

2010 ◽

Vol 2010 ◽

pp. 1-13 ◽

Cited By ~ 7

Author(s):

Elisa Magosso ◽

Andrea Serino ◽

Giuseppe di Pellegrino ◽

Mauro Ursino

Keyword(s):

Cognitive Neuroscience ◽

Computational Modelling ◽

Relevant Information ◽

Neural Correlates ◽

Cooperative Interaction ◽

Endogenous Attention ◽

Neural Basis ◽

Biased Competition ◽

Sensory Modalities ◽

Selection Of

Many studies have revealed that attention operates across different sensory modalities, to facilitate the selection of relevant information in the multimodal situations of every-day life. Cross-modal links have been observed either when attention is directed voluntarily (endogenous) or involuntarily (exogenous). The neural basis of cross-modal attention presents a significant challenge to cognitive neuroscience. Here, we used a neural network model to elucidate the neural correlates of visual-tactile interactions in exogenous and endogenous attention. The model includes two unimodal (visual and tactile) areas connected with a bimodal area in each hemisphere and a competition between the two hemispheres. The model is able to explain cross-modal facilitation both in exogenous and endogenous attention, ascribing it to an advantaged activation of the bimodal area on the attended side (via a top-down or bottom-up biasing), with concomitant inhibition towards the opposite side. The model suggests that a competitive/cooperative interaction with biased competition may mediate both forms of cross-modal attention.

Download Full-text

Contributions of low- and high-level properties to neural processing of visual scenes in the human brain

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2016.0102 ◽

2017 ◽

Vol 372 (1714) ◽

pp. 20160102 ◽

Cited By ~ 60

Author(s):

Iris I. A. Groen ◽

Edward H. Silson ◽

Chris I. Baker

Keyword(s):

Temporal Dynamics ◽

Scene Perception ◽

Visual Scene ◽

Scene Analysis ◽

Extrastriate Cortex ◽

Low Level ◽

Feature Representations ◽

Scene Representation ◽

Scene Processing ◽

High Level

Visual scene analysis in humans has been characterized by the presence of regions in extrastriate cortex that are selectively responsive to scenes compared with objects or faces. While these regions have often been interpreted as representing high-level properties of scenes (e.g. category), they also exhibit substantial sensitivity to low-level (e.g. spatial frequency) and mid-level (e.g. spatial layout) properties, and it is unclear how these disparate findings can be united in a single framework. In this opinion piece, we suggest that this problem can be resolved by questioning the utility of the classical low- to high-level framework of visual perception for scene processing, and discuss why low- and mid-level properties may be particularly diagnostic for the behavioural goals specific to scene perception as compared to object recognition. In particular, we highlight the contributions of low-level vision to scene representation by reviewing (i) retinotopic biases and receptive field properties of scene-selective regions and (ii) the temporal dynamics of scene perception that demonstrate overlap of low- and mid-level feature representations with those of scene category. We discuss the relevance of these findings for scene perception and suggest a more expansive framework for visual scene analysis. This article is part of the themed issue ‘Auditory and visual scene analysis’.

Download Full-text

Selection of Optimal Reasoning path by Bayesian Switching mechanism in the Brain Sensory System

2019 International Conference on Electronics, Information, and Communication (ICEIC) ◽

10.23919/elinfocom.2019.8706454 ◽

2019 ◽

Author(s):

JeongYon Shim

Keyword(s):

Sensory System ◽

Switching Mechanism ◽

The Brain ◽

Selection Of

Download Full-text

Physics-Based Approaches to Visual Scene Analysis

10.21236/ada587998 ◽

2012 ◽

Author(s):

Todd Zickler

Keyword(s):

Visual Scene ◽

Scene Analysis

Download Full-text

Magnetoencephalography and the Cortical Dynamics of Language Processing

The Oxford Handbook of Neurolinguistics ◽

10.1093/oxfordhb/9780190672027.013.6 ◽

2019 ◽

pp. 114-153

Author(s):

Riitta Salmelin ◽

Jan Kujala ◽

Mia Liljeström

Keyword(s):

Language Processing ◽

Language Disorders ◽

Scientific Data ◽

Neural Processing ◽

Powerful Method ◽

Cortical Dynamics ◽

Brain Correlates ◽

Adults And Children ◽

The Brain ◽

Selection Of

When seeking to uncover the brain correlates of language processing, timing and location are of the essence. Magnetoencephalography (MEG) offers them both, with the highest sensitivity to cortical activity. MEG has shown its worth in revealing cortical dynamics of reading, speech perception, and speech production in adults and children, in unimpaired language processing as well as developmental and acquired language disorders. The MEG signals, once recorded, provide an extensive selection of measures for examination of neural processing. Like all other neuroimaging tools, MEG has its own strengths and limitations of which the user should be aware in order to make the best possible use of this powerful method and to generate meaningful and reliable scientific data. This chapter reviews MEG methodology and how MEG has been used to study the cortical dynamics of language.

Download Full-text

Romance and Latin approaches to word structure features

The Linguistic Review ◽

10.1515/tlr-2021-2061 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Elisabeth Gibert-Sotelo ◽

Isabel Pujol Payet

Keyword(s):

The Other ◽

Special Issue ◽

Valuable Contribution ◽

Field Of Study ◽

Word Structure ◽

New Approaches ◽

Theoretical Approaches ◽

Special Collection ◽

Introductory Paper ◽

Selection Of

Abstract The interest in morphology and its interaction with the other grammatical components has increased in the last twenty years, with new approaches coming into stage so as to get more accurate analyses of the processes involved in morphological construal. This special issue is a valuable contribution to this field of study. It gathers a selection of five papers from the Morphology and Syntax workshop (University of Girona, July 2017) which, on the basis of Romance and Latin phenomena, discuss word structure and its decomposition into hierarchies of features. Even though the papers share a compositional view of lexical items, they adopt different formal theoretical approaches to the lexicon-syntax interface, thus showing the benefit of bearing in mind the possibilities that each framework provides. This introductory paper serves as a guide for the readers of this special collection and offers an overview of the topics dealt in each contribution.

Download Full-text

Selection of Reference Frames and the ‘Vicariance’ of Perceptual Systems

Perception ◽

10.1068/p180739 ◽

1989 ◽

Vol 18 (6) ◽

pp. 739-751 ◽

Cited By ~ 12

Author(s):

Christian Marendaz

Keyword(s):

Field Dependence ◽

Postural Balance ◽

Reference Frames ◽

Interindividual Differences ◽

Spatial Reference ◽

Orientation Task ◽

Situational Characteristics ◽

Sensory Modalities ◽

Perceptual Systems ◽

Selection Of

Interindividual differences in field dependence—independence (FDI) which emerge in situations of vision—posture conflict when subjects are required to orient their bodies vertically were investigated. The first aim was to see whether the same interindividual differences are found in judgements of the orientation of forms in focal vision in which subjects have to deal with conflicting spatial references processed by different sensory modalities. The second aim was to test the idea that the FDI dimension is due to functional habits linked to balancing. Subjects performed Kopfermann's (1930) shape-orientation task in either a stable (experiment 1) or an unstable (experiment 2) postural condition. Results showed that the FDI dimension comes into play in the solution of the Kopfermann shape orientation task, and that there is an interactive link between FDI and postural balance, consistent with theoretical expectations. More generally, it appears that the ‘choice’ of a spatial reference system is the product of both individual and situational characteristics, and that the ‘vicariance’ (or inter-changeability) of the sensory systems dealing with gravitational upright is at the basis of this interaction.

Download Full-text