scholarly journals OpenEDS2020 Challenge on Gaze Tracking for VR: Dataset and Results

Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4769
Author(s):  
Cristina Palmero ◽  
Abhishek Sharma ◽  
Karsten Behrendt ◽  
Kapil Krishnakumar ◽  
Oleg V. Komogortsev ◽  
...  

This paper summarizes the OpenEDS 2020 Challenge dataset, the proposed baselines, and results obtained by the top three winners of each competition: (1) Gaze prediction Challenge, with the goal of predicting the gaze vector 1 to 5 frames into the future based on a sequence of previous eye images, and (2) Sparse Temporal Semantic Segmentation Challenge, with the goal of using temporal information to propagate semantic eye labels to contiguous eye image frames. Both competitions were based on the OpenEDS2020 dataset, a novel dataset of eye-image sequences captured at a frame rate of 100 Hz under controlled illumination, using a virtual-reality head-mounted display with two synchronized eye-facing cameras. The dataset, which we make publicly available for the research community, consists of 87 subjects performing several gaze-elicited tasks, and is divided into 2 subsets, one for each competition task. The proposed baselines, based on deep learning approaches, obtained an average angular error of 5.37 degrees for gaze prediction, and a mean intersection over union score (mIoU) of 84.1% for semantic segmentation. The winning solutions were able to outperform the baselines, obtaining up to 3.17 degrees for the former task and 95.2% mIoU for the latter.

Healthcare ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 104 ◽  
Author(s):  
Youngkeun Lee ◽  
Yadav Sunil Kumar ◽  
Daehyeon Lee ◽  
Jihee Kim ◽  
Junggwon Kim ◽  
...  

Saccadic eye movement is an important ability in our daily life and is especially important in driving and sports. Traditionally, the Developmental Eye Movement (DEM) test and the King–Devick (K-D) test have been used to measure saccadic eye movement, but these only involve measurements with “adjusted time”. Therefore, a different approach is required to obtain the eye movement speed and reaction rate in detail, as some are rapid eye movements, while others are slow actions, and vice versa. This study proposed an extended method that can acquire the “rest time” and “transfer time”, as well as the “adjusted time”, by implementing a virtual reality-based DEM test, using a FOVE virtual reality (VR) head-mounted display (HMD), equipped with an eye-tracking module. This approach was tested in 30 subjects with normal vision and no ophthalmologic disease by using a 2-diopter (50-cm) distance. This allowed for measurements of the “adjusted time” and the “rest time” for focusing on each target number character, the “transfer time” for moving to the next target number character, and recording of the gaze-tracking log. The results of this experiment showed that it was possible to analyze more parameters of the saccadic eye movement with the proposed method than with the traditional methods.


2021 ◽  
Vol 13 (16) ◽  
pp. 3065
Author(s):  
Libo Wang ◽  
Rui Li ◽  
Dongzhi Wang ◽  
Chenxi Duan ◽  
Teng Wang ◽  
...  

Semantic segmentation from very fine resolution (VFR) urban scene images plays a significant role in several application scenarios including autonomous driving, land cover classification, urban planning, etc. However, the tremendous details contained in the VFR image, especially the considerable variations in scale and appearance of objects, severely limit the potential of the existing deep learning approaches. Addressing such issues represents a promising research field in the remote sensing community, which paves the way for scene-level landscape pattern analysis and decision making. In this paper, we propose a Bilateral Awareness Network which contains a dependency path and a texture path to fully capture the long-range relationships and fine-grained details in VFR images. Specifically, the dependency path is conducted based on the ResT, a novel Transformer backbone with memory-efficient multi-head self-attention, while the texture path is built on the stacked convolution operation. In addition, using the linear attention mechanism, a feature aggregation module is designed to effectively fuse the dependency features and texture features. Extensive experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effectiveness of our BANet. Specifically, a 64.6% mIoU is achieved on the UAVid dataset.


2021 ◽  
Vol 198 ◽  
pp. 110683
Author(s):  
David B. Menasche ◽  
Paul A. Shade ◽  
S. Safriet ◽  
Peter Kenesei ◽  
Jun-Sang Park ◽  
...  

2021 ◽  
Vol 309 ◽  
pp. 01117
Author(s):  
A. Sai Hanuman ◽  
G. Prasanna Kumar

Studies on lane detection Lane identification methods, integration, and evaluation strategies square measure all examined. The system integration approaches for building a lot of strong detection systems are then evaluated and analyzed, taking into account the inherent limits of camera-based lane detecting systems. Present deep learning approaches to lane detection are inherently CNN's semantic segmentation network the results of the segmentation of the roadways and the segmentation of the lane markers are fused using a fusion method. By manipulating a huge number of frames from a continuous driving environment, we examine lane detection, and we propose a hybrid deep architecture that combines the convolution neural network (CNN) and the continuous neural network (CNN) (RNN). Because of the extensive information background and the high cost of camera equipment, a substantial number of existing results concentrate on vision-based lane recognition systems. Extensive tests on two large-scale datasets show that the planned technique outperforms rivals' lane detection strategies, particularly in challenging settings. A CNN block in particular isolates information from each frame before sending the CNN choices of several continuous frames with time-series qualities to the RNN block for feature learning and lane prediction.


Author(s):  
F. Politz ◽  
M. Sester

<p><strong>Abstract.</strong> Over the past years, the algorithms for dense image matching (DIM) to obtain point clouds from aerial images improved significantly. Consequently, DIM point clouds are now a good alternative to the established Airborne Laser Scanning (ALS) point clouds for remote sensing applications. In order to derive high-level applications such as digital terrain models or city models, each point within a point cloud must be assigned a class label. Usually, ALS and DIM are labelled with different classifiers due to their varying characteristics. In this work, we explore both point cloud types in a fully convolutional encoder-decoder network, which learns to classify ALS as well as DIM point clouds. As input, we project the point clouds onto a 2D image raster plane and calculate the minimal, average and maximal height values for each raster cell. The network then differentiates between the classes ground, non-ground, building and no data. We test our network in six training setups using only one point cloud type, both point clouds as well as several transfer-learning approaches. We quantitatively and qualitatively compare all results and discuss the advantages and disadvantages of all setups. The best network achieves an overall accuracy of 96<span class="thinspace"></span>% in an ALS and 83<span class="thinspace"></span>% in a DIM test set.</p>


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1737 ◽  
Author(s):  
Tae-young Ko ◽  
Seung-ho Lee

This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.


Sign in / Sign up

Export Citation Format

Share Document