OpenEDS2020 Challenge on Gaze Tracking for VR: Dataset and Results

Cristina Palmero; Abhishek Sharma; Karsten Behrendt; Kapil Krishnakumar; Oleg V. Komogortsev; Sachin S. Talathi

doi:10.3390/s21144769

OpenEDS2020 Challenge on Gaze Tracking for VR: Dataset and Results

Sensors ◽

10.3390/s21144769 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4769

Author(s):

Cristina Palmero ◽

Abhishek Sharma ◽

Karsten Behrendt ◽

Kapil Krishnakumar ◽

Oleg V. Komogortsev ◽

...

Keyword(s):

Semantic Segmentation ◽

Image Sequences ◽

Frame Rate ◽

Learning Approaches ◽

Gaze Tracking ◽

The Gaze ◽

Head Mounted Display ◽

Average Angular Error ◽

Controlled Illumination ◽

Gaze Prediction

This paper summarizes the OpenEDS 2020 Challenge dataset, the proposed baselines, and results obtained by the top three winners of each competition: (1) Gaze prediction Challenge, with the goal of predicting the gaze vector 1 to 5 frames into the future based on a sequence of previous eye images, and (2) Sparse Temporal Semantic Segmentation Challenge, with the goal of using temporal information to propagate semantic eye labels to contiguous eye image frames. Both competitions were based on the OpenEDS2020 dataset, a novel dataset of eye-image sequences captured at a frame rate of 100 Hz under controlled illumination, using a virtual-reality head-mounted display with two synchronized eye-facing cameras. The dataset, which we make publicly available for the research community, consists of 87 subjects performing several gaze-elicited tasks, and is divided into 2 subsets, one for each competition task. The proposed baselines, based on deep learning approaches, obtained an average angular error of 5.37 degrees for gaze prediction, and a mean intersection over union score (mIoU) of 84.1% for semantic segmentation. The winning solutions were able to outperform the baselines, obtaining up to 3.17 degrees for the former task and 95.2% mIoU for the latter.

Download Full-text

An Extended Method for Saccadic Eye Movement Measurements Using a Head-Mounted Display

Healthcare ◽

10.3390/healthcare8020104 ◽

2020 ◽

Vol 8 (2) ◽

pp. 104 ◽

Cited By ~ 1

Author(s):

Youngkeun Lee ◽

Yadav Sunil Kumar ◽

Daehyeon Lee ◽

Jihee Kim ◽

Junggwon Kim ◽

...

Keyword(s):

Virtual Reality ◽

Eye Movement ◽

Movement Speed ◽

Saccadic Eye Movement ◽

Transfer Time ◽

Target Number ◽

Gaze Tracking ◽

The Gaze ◽

Head Mounted Display ◽

Rest Time

Saccadic eye movement is an important ability in our daily life and is especially important in driving and sports. Traditionally, the Developmental Eye Movement (DEM) test and the King–Devick (K-D) test have been used to measure saccadic eye movement, but these only involve measurements with “adjusted time”. Therefore, a different approach is required to obtain the eye movement speed and reaction rate in detail, as some are rapid eye movements, while others are slow actions, and vice versa. This study proposed an extended method that can acquire the “rest time” and “transfer time”, as well as the “adjusted time”, by implementing a virtual reality-based DEM test, using a FOVE virtual reality (VR) head-mounted display (HMD), equipped with an eye-tracking module. This approach was tested in 30 subjects with normal vision and no ophthalmologic disease by using a 2-diopter (50-cm) distance. This allowed for measurements of the “adjusted time” and the “rest time” for focusing on each target number character, the “transfer time” for moving to the next target number character, and recording of the gaze-tracking log. The results of this experiment showed that it was possible to analyze more parameters of the saccadic eye movement with the proposed method than with the traditional methods.

Download Full-text

A Weak Moving Point Target Detection Method Based on High Frame Rate SAR Image Sequences and Machine Learning

IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss39084.2020.9323118 ◽

2020 ◽

Author(s):

Chen Zhao ◽

Pengbo Wang ◽

Jie Chen ◽

Wei Yang

Keyword(s):

Machine Learning ◽

Target Detection ◽

Detection Method ◽

Image Sequences ◽

Frame Rate ◽

Sar Image ◽

High Frame Rate ◽

Point Target ◽

Point Target Detection

Download Full-text

Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images

Remote Sensing ◽

10.3390/rs13163065 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3065

Author(s):

Libo Wang ◽

Rui Li ◽

Dongzhi Wang ◽

Chenxi Duan ◽

Teng Wang ◽

...

Keyword(s):

Large Scale ◽

Texture Features ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Research Field ◽

Learning Approaches ◽

Fine Grained ◽

Urban Scene ◽

Fine Resolution ◽

With Memory

Semantic segmentation from very fine resolution (VFR) urban scene images plays a significant role in several application scenarios including autonomous driving, land cover classification, urban planning, etc. However, the tremendous details contained in the VFR image, especially the considerable variations in scale and appearance of objects, severely limit the potential of the existing deep learning approaches. Addressing such issues represents a promising research field in the remote sensing community, which paves the way for scene-level landscape pattern analysis and decision making. In this paper, we propose a Bilateral Awareness Network which contains a dependency path and a texture path to fully capture the long-range relationships and fine-grained details in VFR images. Specifically, the dependency path is conducted based on the ResT, a novel Transformer backbone with memory-efficient multi-head self-attention, while the texture path is built on the stacked convolution operation. In addition, using the linear attention mechanism, a feature aggregation module is designed to effectively fuse the dependency features and texture features. Extensive experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effectiveness of our BANet. Specifically, a 64.6% mIoU is achieved on the UAVid dataset.

Download Full-text

Deep learning approaches to semantic segmentation of fatigue cracking within cyclically loaded nickel superalloy

Computational Materials Science ◽

10.1016/j.commatsci.2021.110683 ◽

2021 ◽

Vol 198 ◽

pp. 110683

Author(s):

David B. Menasche ◽

Paul A. Shade ◽

S. Safriet ◽

Peter Kenesei ◽

Jun-Sang Park ◽

...

Keyword(s):

Deep Learning ◽

Fatigue Cracking ◽

Semantic Segmentation ◽

Nickel Superalloy ◽

Learning Approaches

Download Full-text

Effects of frame rate for visualization of dynamic quantitative information in a head-mounted display

2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583) ◽

10.1109/icsmc.2004.1401421 ◽

2005 ◽

Author(s):

Whang Yee Lai ◽

H.B.-L. Duh

Keyword(s):

Quantitative Information ◽

Frame Rate ◽

Head Mounted Display

Download Full-text

Survey Analysis of Robust and Real-Time Multi-Lane and Single Lane Detection in Indian Highway Scenarios

E3S Web of Conferences ◽

10.1051/e3sconf/202130901117 ◽

2021 ◽

Vol 309 ◽

pp. 01117

Author(s):

A. Sai Hanuman ◽

G. Prasanna Kumar

Keyword(s):

Neural Network ◽

System Integration ◽

Large Scale ◽

Feature Learning ◽

Semantic Segmentation ◽

Lane Detection ◽

Learning Approaches ◽

Survey Analysis ◽

Lane Recognition ◽

Continuous Frames

Studies on lane detection Lane identification methods, integration, and evaluation strategies square measure all examined. The system integration approaches for building a lot of strong detection systems are then evaluated and analyzed, taking into account the inherent limits of camera-based lane detecting systems. Present deep learning approaches to lane detection are inherently CNN's semantic segmentation network the results of the segmentation of the roadways and the segmentation of the lane markers are fused using a fusion method. By manipulating a huge number of frames from a continuous driving environment, we examine lane detection, and we propose a hybrid deep architecture that combines the convolution neural network (CNN) and the continuous neural network (CNN) (RNN). Because of the extensive information background and the high cost of camera equipment, a substantial number of existing results concentrate on vision-based lane recognition systems. Extensive tests on two large-scale datasets show that the planned technique outperforms rivals' lane detection strategies, particularly in challenging settings. A CNN block in particular isolates information from each frame before sending the CNN choices of several continuous frames with time-series qualities to the RNN block for feature learning and lane prediction.

Download Full-text

EXPLORING ALS AND DIM DATA FOR SEMANTIC SEGMENTATION USING CNNS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-1-347-2018 ◽

2018 ◽

Vol XLII-1 ◽

pp. 347-354 ◽

Cited By ~ 5

Author(s):

F. Politz ◽

M. Sester

Keyword(s):

Point Cloud ◽

Laser Scanning ◽

Semantic Segmentation ◽

Point Clouds ◽

Good Alternative ◽

Aerial Images ◽

Learning Approaches ◽

Advantages And Disadvantages ◽

Sensing Applications ◽

High Level

Abstract. Over the past years, the algorithms for dense image matching (DIM) to obtain point clouds from aerial images improved significantly. Consequently, DIM point clouds are now a good alternative to the established Airborne Laser Scanning (ALS) point clouds for remote sensing applications. In order to derive high-level applications such as digital terrain models or city models, each point within a point cloud must be assigned a class label. Usually, ALS and DIM are labelled with different classifiers due to their varying characteristics. In this work, we explore both point cloud types in a fully convolutional encoder-decoder network, which learns to classify ALS as well as DIM point clouds. As input, we project the point clouds onto a 2D image raster plane and calculate the minimal, average and maximal height values for each raster cell. The network then differentiates between the classes ground, non-ground, building and no data. We test our network in six training setups using only one point cloud type, both point clouds as well as several transfer-learning approaches. We quantitatively and qualitatively compare all results and discuss the advantages and disadvantages of all setups. The best network achieves an overall accuracy of 96% in an ALS and 83% in a DIM test set.

Download Full-text

Towards an Augmented Reality Head Mounted Display System Providing Stereoscopic Wide Field of View for Indoor and Outdoor Environments with Interaction through the Gaze Direction

Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications ◽

10.5220/0006536500170027 ◽

2018 ◽

Author(s):

Jessica Combier ◽

Bertrand Vandeportaele ◽

Patrick Danès

Keyword(s):

Field Of View ◽

Gaze Direction ◽

Wide Field ◽

Display System ◽

The Gaze ◽

Head Mounted Display ◽

Wide Field Of View ◽

Outdoor Environments ◽

Indoor And Outdoor Environments ◽

Indoor And Outdoor

Download Full-text

Locating Facial Features Using an Anthropometric Face Model for Determining the Gaze of Faces in Image Sequences

Lecture Notes in Computer Science - Image Analysis and Recognition ◽

10.1007/978-3-540-74260-9_75 ◽

2007 ◽

pp. 839-853 ◽

Cited By ~ 4

Author(s):

Jorge P. Batista

Keyword(s):

Image Sequences ◽

Facial Features ◽

Face Model ◽

The Gaze

Download Full-text

Novel Method of Semantic Segmentation Applicable to Augmented Reality

Sensors ◽

10.3390/s20061737 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1737 ◽

Cited By ~ 1

Author(s):

Tae-young Ko ◽

Seung-ho Lee

Keyword(s):

Augmented Reality ◽

Spatial Information ◽

Ground Truth ◽

Semantic Segmentation ◽

Frame Rate ◽

Feature Maps ◽

Residual Network ◽

Feature Map ◽

Pascal Voc ◽

Novel Method

This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.

Download Full-text