What has been missed for predicting human attention in viewing driving clips?

PeerJ ◽

10.7717/peerj.2946 ◽

2017 ◽

Vol 5 ◽

pp. e2946 ◽

Cited By ~ 3

Author(s):

Jiawei Xu ◽

Shigang Yue ◽

Federica Menchinelli ◽

Kun Guo

Keyword(s):

Visual Attention ◽

Video Sequence ◽

Scene Perception ◽

Observer Agreement ◽

Research Progress ◽

Temporal Regularity ◽

Frame Sequence ◽

Spatio Temporal ◽

Fixation Bias ◽

Central Fixation

Recent research progress on the topic of human visual attention allocation in scene perception and its simulation is based mainly on studies with static images. However, natural vision requires us to extract visual information that constantly changes due to egocentric movements or dynamics of the world. It is unclear to what extent spatio-temporal regularity, an inherent regularity in dynamic vision, affects human gaze distribution and saliency computation in visual attention models. In this free-viewing eye-tracking study we manipulated the spatio-temporal regularity of traffic videos by presenting them in normal video sequence, reversed video sequence, normal frame sequence, and randomised frame sequence. The recorded human gaze allocation was then used as the ‘ground truth’ to examine the predictive ability of a number of state-of-the-art visual attention models. The analysis revealed high inter-observer agreement across individual human observers, but all the tested attention models performed significantly worse than humans. The inferior predictability of the models was evident from indistinguishable gaze prediction irrespective of stimuli presentation sequence, and weak central fixation bias. Our findings suggest that a realistic visual attention model for the processing of dynamic scenes should incorporate human visual sensitivity with spatio-temporal regularity and central fixation bias.

Download Full-text

Glimpse: A Gaze-Based Measure of Temporal Salience

Sensors ◽

10.3390/s21093099 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3099

Author(s):

V. Javier Traver ◽

Judith Zorío ◽

Luis A. Leiva

Keyword(s):

Visual Attention ◽

Computational Models ◽

Temporal Evolution ◽

Temporal Consistency ◽

Visual Salience ◽

Temporal Dimension ◽

Spatial Perspective ◽

Spatio Temporal ◽

Scoring Algorithms ◽

Over Time

Temporal salience considers how visual attention varies over time. Although visual salience has been widely studied from a spatial perspective, its temporal dimension has been mostly ignored, despite arguably being of utmost importance to understand the temporal evolution of attention on dynamic contents. To address this gap, we proposed Glimpse, a novel measure to compute temporal salience based on the observer-spatio-temporal consistency of raw gaze data. The measure is conceptually simple, training free, and provides a semantically meaningful quantification of visual attention over time. As an extension, we explored scoring algorithms to estimate temporal salience from spatial salience maps predicted with existing computational models. However, these approaches generally fall short when compared with our proposed gaze-based measure. Glimpse could serve as the basis for several downstream tasks such as segmentation or summarization of videos. Glimpse’s software and data are publicly available.

Download Full-text

Action recognition using spatio-temporal regularity based features

2008 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2008.4517717 ◽

2008 ◽

Cited By ~ 6

Author(s):

Taylor Goodhart ◽

Pingkun Yan ◽

Mubarak Shah

Keyword(s):

Action Recognition ◽

Temporal Regularity ◽

Spatio Temporal

Download Full-text

OC18.02: Inter-observer agreement on cardiac defects using spatio temporal image correlation and volume computer aided diagnosis

Ultrasound in Obstetrics and Gynecology ◽

10.1002/uog.6563 ◽

2009 ◽

Vol 34 (S1) ◽

pp. 34-34

Author(s):

J. Bello-Muñoz ◽

E. Carreras ◽

D. C. Albert ◽

Q. Ferrer ◽

A. Esquivel ◽

...

Keyword(s):

Computer Aided Diagnosis ◽

Observer Agreement ◽

Image Correlation ◽

Cardiac Defects ◽

Computer Aided ◽

Spatio Temporal ◽

Aided Diagnosis

Download Full-text

Spatio‐temporal context based recurrent visual attention model for lymph node detection

International Journal of Imaging Systems and Technology ◽

10.1002/ima.22430 ◽

2020 ◽

Vol 30 (4) ◽

pp. 1220-1242

Author(s):

Haixin Peng ◽

Yinjun Peng

Keyword(s):

Lymph Node ◽

Visual Attention ◽

Temporal Context ◽

Visual Attention Model ◽

Attention Model ◽

Spatio Temporal

Download Full-text

On the application of a two-dimension spatio-temporal cross-correlation method to inverse coastal bathymetry from waves using a satellite-based video sequence

IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2019.8899819 ◽

2019 ◽

Author(s):

Rafael Almar ◽

Erwin W. J. Bergsma ◽

Philippe Maisongrande ◽

Alain Giros ◽

Luis Pedro Almeida

Keyword(s):

Cross Correlation ◽

Video Sequence ◽

Correlation Method ◽

Two Dimension ◽

Spatio Temporal

Download Full-text

Anomaly detection based on spatio-temporal sparse representation and visual attention analysis

Multimedia Tools and Applications ◽

10.1007/s11042-015-3199-8 ◽

2016 ◽

Vol 76 (5) ◽

pp. 6263-6279 ◽

Cited By ~ 5

Author(s):

Chen Wang ◽

Hongxun Yao ◽

Xiaoshuai Sun

Keyword(s):

Visual Attention ◽

Anomaly Detection ◽

Sparse Representation ◽

Spatio Temporal

Download Full-text

Dissociating Vision and Visual Attention in the Human Pulvinar

Journal of Neurophysiology ◽

10.1152/jn.90963.2008 ◽

2009 ◽

Vol 101 (2) ◽

pp. 917-925 ◽

Cited By ~ 29

Author(s):

A. T. Smith ◽

P. L. Cotton ◽

A. Bruno ◽

C. Moutsiana

Keyword(s):

Visual Field ◽

Visual Attention ◽

Visual Stimulus ◽

Optic Flow ◽

Nonhuman Primates ◽

Retinotopic Organization ◽

Cortical Regions ◽

Mri Study ◽

Related Design ◽

Central Fixation

The pulvinar region of the thalamus has repeatedly been linked with the control of attention. However, the functions of the pulvinar remain poorly characterized, both in human and in nonhuman primates. In a functional MRI study, we examined the relative contributions to activity in the human posterior pulvinar made by visual drive (the presence of an unattended visual stimulus) and attention (covert spatial attention to the stimulus). In an event-related design, large optic flow stimuli were presented to the left and/or right of a central fixation point. When unattended, the stimuli robustly activated two regions of the pulvinar, one medial and one dorsal with respect to the lateral geniculate. The activity in both regions shows a strong contralateral bias, suggesting retinotopic organization. Primate physiology suggests that the two regions could be two portions of the same double map of the visual field. In our paradigm, attending to the stimulus enhanced the response by about 20%. Thus attention is not necessary to activate the human pulvinar and the degree of attentional enhancement matches, but does not exceed, that seen in the cortical regions with which the posterior pulvinar connects.

Download Full-text

METHOD OF MULTI-MODAL VIDEO ANALYSIS OF HAND MOVEMENTS FOR AUTOMATIC RECOGNITION OF ISOLATED SIGNS OF RUSSIAN SIGN LANGUAGE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliv-2-w1-2021-7-2021 ◽

2021 ◽

Vol XLIV-2/W1-2021 ◽

pp. 7-13

Author(s):

А. Axyonov ◽

D. Ryumin ◽

I. Kagirov

Keyword(s):

Sign Language ◽

Video Sequence ◽

Recognition Accuracy ◽

Video Data ◽

New Method ◽

Visual Features ◽

Temporal Data ◽

Sign Recognition ◽

Spatio Temporal ◽

Extract Information

Abstract. This paper presents a new method for collecting multimodal sign language (SL) databases, which is distinguished by the use of multimodal video data. The paper also proposes a new method of multimodal sign recognition, which is distinguished by the analysis of spatio-temporal visual features of SL units (i.e. lexemes). Generally, gesture recognition is a processing of a video sequence, which helps to extract information on movements of any articulator (a part of the human body) in time and space. With this approach, the recognition accuracy of isolated signs was 88.92%. The proposed method, due to the extraction and analysis of spatio-temporal data, makes it possible to identify more informative features of signs, which leads to an increase in the accuracy of SL recognition.

Download Full-text

Distinct spatiotemporal mechanisms underlie extra-classical receptive field modulation in macaque V1 microcircuits

eLife ◽

10.7554/elife.54264 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 1

Author(s):

Christopher A Henry ◽

Mehrdad Jazayeri ◽

Robert M Shapley ◽

Michael J Hawken

Keyword(s):

Receptive Field ◽

Temporal Dynamics ◽

Scene Perception ◽

Orientation Tuning ◽

Context Modeling ◽

Complex Scene ◽

V1 Neurons ◽

Classical Receptive Field ◽

Spatio Temporal ◽

Underlying Mechanisms

Complex scene perception depends upon the interaction between signals from the classical receptive field (CRF) and the extra-classical receptive field (eCRF) in primary visual cortex (V1) neurons. Although much is known about V1 eCRF properties, we do not yet know how the underlying mechanisms map onto the cortical microcircuit. We probed the spatio-temporal dynamics of eCRF modulation using a reverse correlation paradigm, and found three principal eCRF mechanisms: tuned-facilitation, untuned-suppression, and tuned-suppression. Each mechanism had a distinct timing and spatial profile. Laminar analysis showed that the timing, orientation-tuning, and strength of eCRF mechanisms had distinct signatures within magnocellular and parvocellular processing streams in the V1 microcircuit. The existence of multiple eCRF mechanisms provides new insights into how V1 responds to spatial context. Modeling revealed that the differences in timing and scale of these mechanisms predicted distinct patterns of net modulation, reconciling many previous disparate physiological and psychophysical findings.

Download Full-text

FISR: Deep Joint Frame Interpolation and Super-Resolution with a Multi-Scale Temporal Loss

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6788 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11278-11286 ◽

Cited By ~ 2

Author(s):

Soo Ye Kim ◽

Jihyong Oh ◽

Munchurl Kim

Keyword(s):

Video Sequence ◽

Super Resolution ◽

Motion Artifacts ◽

Video Frame ◽

High Definition ◽

Frame Interpolation ◽

Display Devices ◽

Multi Scale ◽

Training Scheme ◽

Spatio Temporal

Super-resolution (SR) has been widely used to convert low-resolution legacy videos to high-resolution (HR) ones, to suit the increasing resolution of displays (e.g. UHD TVs). However, it becomes easier for humans to notice motion artifacts (e.g. motion judder) in HR videos being rendered on larger-sized display devices. Thus, broadcasting standards support higher frame rates for UHD (Ultra High Definition) videos (4K@60 fps, 8K@120 fps), meaning that applying SR only is insufficient to produce genuine high quality videos. Hence, to up-convert legacy videos for realistic applications, not only SR but also video frame interpolation (VFI) is necessitated. In this paper, we first propose a joint VFI-SR framework for up-scaling the spatio-temporal resolution of videos from 2K 30 fps to 4K 60 fps. For this, we propose a novel training scheme with a multi-scale temporal loss that imposes temporal regularization on the input video sequence, which can be applied to any general video-related task. The proposed structure is analyzed in depth with extensive experiments.

Download Full-text