scholarly journals High-Resolution Neural Network for Driver Visual Attention Prediction

Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 2030 ◽  
Author(s):  
Byeongkeun Kang ◽  
Yeejin Lee

Driving is a task that puts heavy demands on visual information, thereby the human visual system plays a critical role in making proper decisions for safe driving. Understanding a driver’s visual attention and relevant behavior information is a challenging but essential task in advanced driver-assistance systems (ADAS) and efficient autonomous vehicles (AV). Specifically, robust prediction of a driver’s attention from images could be a crucial key to assist intelligent vehicle systems where a self-driving car is required to move safely interacting with the surrounding environment. Thus, in this paper, we investigate a human driver’s visual behavior in terms of computer vision to estimate the driver’s attention locations in images. First, we show that feature representations at high resolution improves visual attention prediction accuracy and localization performance when being fused with features at low-resolution. To demonstrate this, we employ a deep convolutional neural network framework that learns and extracts feature representations at multiple resolutions. In particular, the network maintains the feature representation with the highest resolution at the original image resolution. Second, attention prediction tends to be biased toward centers of images when neural networks are trained using typical visual attention datasets. To avoid overfitting to the center-biased solution, the network is trained using diverse regions of images. Finally, the experimental results verify that our proposed framework improves the prediction accuracy of a driver’s attention locations.

Author(s):  
Guanbin Li ◽  
Xin Zhu ◽  
Yirui Zeng ◽  
Qing Wang ◽  
Liang Lin

Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance.


2019 ◽  
Vol 19 (4) ◽  
pp. 1003-1016 ◽  
Author(s):  
Yasamin Keshmiri Esfandabadi ◽  
Maxime Bilodeau ◽  
Patrice Masson ◽  
Luca De Marchi

Ultrasonic wavefield imaging with a non-contact technology can provide detailed information about the health status of an inspected structure. However, high spatial resolution, often necessary for accurate damage quantification, typically demands a long scanning time. In this work, we investigate a novel methodology to acquire high-resolution wavefields with a reduced number of measurement points to minimize the acquisition time. Such methodology is based on the combination of compressive sensing and convolutional neural networks to recover high spatial frequency information from low-resolution images. A data set was built from 652 wavefield images acquired with a laser Doppler vibrometer describing guided ultrasonic wave propagation in eight different structures, with and without various simulated defects. Out of those 652 images, 326 cases without defect and 326 cases with defect were used as a training database for the convolutional neural network. In addition, 273 wavefield images were used as a testing database to validate the proposed methodology. For quantitative evaluation, two image quality metrics were calculated and compared to those achieved with different recovery methods or by training the convolutional neural network with non-wavefield images data set. The results demonstrate the capability of the technique for enhancing image resolution and quality, as well as similarity to the wavefield acquired on the full high-resolution grid of scan points, while reducing the number of measurement points down to 10% of the number of scan points for a full grid.


2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
Na Li ◽  
Xinbo Zhao ◽  
Yongjia Yang ◽  
Xiaochun Zou

Humans can easily classify different kinds of objects whereas it is quite difficult for computers. As a hot and difficult problem, objects classification has been receiving extensive interests with broad prospects. Inspired by neuroscience, deep learning concept is proposed. Convolutional neural network (CNN) as one of the methods of deep learning can be used to solve classification problem. But most of deep learning methods, including CNN, all ignore the human visual information processing mechanism when a person is classifying objects. Therefore, in this paper, inspiring the completed processing that humans classify different kinds of objects, we bring forth a new classification method which combines visual attention model and CNN. Firstly, we use the visual attention model to simulate the processing of human visual selection mechanism. Secondly, we use CNN to simulate the processing of how humans select features and extract the local features of those selected areas. Finally, not only does our classification method depend on those local features, but also it adds the human semantic features to classify objects. Our classification method has apparently advantages in biology. Experimental results demonstrated that our method made the efficiency of classification improve significantly.


2021 ◽  
Vol 13 (10) ◽  
pp. 1979
Author(s):  
Yijun Chen ◽  
Sensen Wu ◽  
Yuanyuan Wang ◽  
Feng Zhang ◽  
Renyi Liu ◽  
...  

Satellite-retrieved aerosol optical depth (AOD) data are extensively integrated with ground-level measurements to achieve spatially continuous fine particulate matters (PM2.5). Current satellite-based methods however face challenges in obtaining highly accurate and reasonable PM2.5 distributions due to the inability to handle both spatial non-stationarity and complex non-linearity in the PM2.5–AOD relationship. High-resolution (<1 km) PM2.5 products over the whole of China for fine exposure assessment and health research are also lacking. This study aimed to predict 750 m resolution ground-level PM2.5 in China with the high-resolution Visible Infrared Imaging Radiometer Suite (VIIRS) intermediate product (IP) AOD data using a newly developed geographically neural network weighted regression (GNNWR) model. The performance evaluations demonstrated that GNNWR achieved higher prediction accuracy than the widely used methods with cross-validation and predictive R2 of 0.86 and 0.85. Satellite-derived monthly 750 m resolution PM2.5 data in China were generated with robust prediction accuracy and almost complete coverage. The PM2.5 pollution was found to be greatly improved in 2018 in China with annual mean concentration of 31.07 ± 17.52 µg/m3. Nonetheless, fine-scale PM2.5 exposures at multiple administrative levels suggested that PM2.5 pollution in most urban areas needed further control, especially in southern Hebei Province. This work is the first to evaluate the potential of VIIRS IP AOD in modeling high-resolution PM2.5 over large-scale. The newly satellite-derived PM2.5 data with high spatial resolution and high prediction accuracy at the national scale are valuable to advance environmental and health researches in China.


2020 ◽  
Author(s):  
Aidan C. Daly ◽  
Krzysztof J. Geras ◽  
Richard A. Bonneau

AbstractRegistration of histology images from multiple sources is a pressing problem in large-scale studies of spatial -omics data. Researchers often perform “common coordinate registration,” akin to segmentation, in which samples are partitioned based on tissue type to allow for quantitative comparison of similar regions across samples. Accuracy in such registration requires both high image resolution and global awareness, which mark a difficult balancing act for contemporary deep learning architectures. We present a novel convolutional neural network (CNN) architecture that combines (1) a local classification CNN that extracts features from image patches sampled sparsely across the tissue surface, and (2) a global segmentation CNN that operates on these extracted features. This hybrid network can be trained in an end-to-end manner, and we demonstrate its relative merits over competing approaches on a reference histology dataset as well as two published spatial transcriptomics datasets. We believe that this paradigm will greatly enhance our ability to process spatial -omics data, and has general purpose applications for the processing of high-resolution histology images on commercially available GPUs.


2021 ◽  
Vol 13 (24) ◽  
pp. 4958
Author(s):  
Ziwei Liu ◽  
Mingchang Wang ◽  
Fengyan Wang ◽  
Xue Ji

Extracting road information from high-resolution remote sensing images (HRI) can provide crucial geographic information for many applications. With the improvement of remote sensing image resolution, the image data contain more abundant feature information. However, this phenomenon also enhances the spatial heterogeneity between different types of roads, making it difficult to accurately discern the road and non-road regions using only spectral characteristics. To remedy the above issues, a novel residual attention and local context-aware network (RALC-Net) is proposed for extracting a complete and continuous road network from HRI. RALC-Net utilizes a dual-encoder structure to improve the feature extraction capability of the network, whose two different branches take different feature information as input data. Specifically, we construct the residual attention module using the residual connection that can integrate spatial context information and the attention mechanism, highlighting local semantics to extract local feature information of roads. The residual attention module combines the characteristics of both the residual connection and the attention mechanism to retain complete road edge information, highlight essential semantics, and enhance the generalization capability of the network model. In addition, the multi-scale dilated convolution module is used to extract multi-scale spatial receptive fields to improve the model’s performance further. We perform experiments to verify the performance of each component of RALC-Net through the ablation study. By combining low-level features with high-level semantics, we extract road information and make comparisons with other state-of-the-art models. The experimental results show that the proposed RALC-Net has excellent feature representation ability and robust generalizability, and can extract complete road information from a complex environment.


2018 ◽  
Vol 10 (12) ◽  
pp. 1890 ◽  
Author(s):  
Mohamad Al Rahhal ◽  
Yakoub Bazi ◽  
Taghreed Abdullah ◽  
Mohamed Mekhalfi ◽  
Haikel AlHichri ◽  
...  

In this paper we propose a multi-branch neural network, called MB-Net, for solving the problem of knowledge adaptation from multiple remote sensing scene datasets acquired with different sensors over diverse locations and manually labeled with different experts. Our aim is to learn invariant feature representations from multiple source domains with labeled images and one target domain with unlabeled images. To this end, we define for MB-Net an objective function that mitigates the multiple domain shifts at both feature representation and decision levels, while retaining the ability to discriminate between different land-cover classes. The complete architecture is trainable end-to-end via the backpropagation algorithm. In the experiments, we demonstrate the effectiveness of the proposed method on a new multiple domain dataset created from four heterogonous scene datasets well known to the remote sensing community, namely, the University of California (UC-Merced) dataset, the Aerial Image dataset (AID), the PatternNet dataset, and the Northwestern Polytechnical University (NWPU) dataset. In particular, this method boosts the average accuracy over all transfer scenarios up to 89.05% compared to standard architecture based only on cross-entropy loss, which yields an average accuracy of 78.53%.


2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
Guangjun Zhao ◽  
Xuchu Wang ◽  
Yanmin Niu ◽  
Liwen Tan ◽  
Shao-Xiang Zhang

Cryosection brain images in Chinese Visible Human (CVH) dataset contain rich anatomical structure information of tissues because of its high resolution (e.g., 0.167 mm per pixel). Fast and accurate segmentation of these images into white matter, gray matter, and cerebrospinal fluid plays a critical role in analyzing and measuring the anatomical structures of human brain. However, most existing automated segmentation methods are designed for computed tomography or magnetic resonance imaging data, and they may not be applicable for cryosection images due to the imaging difference. In this paper, we propose a supervised learning-based CVH brain tissues segmentation method that uses stacked autoencoder (SAE) to automatically learn the deep feature representations. Specifically, our model includes two successive parts where two three-layer SAEs take image patches as input to learn the complex anatomical feature representation, and then these features are sent to Softmax classifier for inferring the labels. Experimental results validated the effectiveness of our method and showed that it outperformed four other classical brain tissue detection strategies. Furthermore, we reconstructed three-dimensional surfaces of these tissues, which show their potential in exploring the high-resolution anatomical structures of human brain.


Informatics ◽  
2020 ◽  
Vol 17 (2) ◽  
pp. 7-16
Author(s):  
R. P. Bohush ◽  
I. Yu. Zakharava ◽  
S. V. Ablameyko

In the paper the algorithm for object detection in high resolution images is proposed. The approach uses multiscale image representation followed by block processing with the overlapping value. For each block the object detection with convolutional neural network was performed. Number of pyramid layers is limited by the Convolutional Neural Network layer size and input image resolution. Overlapping blocks splitting to improve the classification and detection accuracy is performed on each layer of pyramid except the highest one. Detected areas are merged into one if they have high overlapping value and the same class. Experimental results for the algorithm are presented in the paper.


2020 ◽  
Vol 12 (2) ◽  
pp. 338 ◽  
Author(s):  
Yunling Liu ◽  
Chaojun Cen ◽  
Yingpu Che ◽  
Rui Ke ◽  
Yan Ma ◽  
...  

Maize tassels play a critical role in plant growth and yield. Extensive RGB images obtained using unmanned aerial vehicle (UAV) and the prevalence of deep learning provide a chance to improve the accuracy of detecting maize tassels. We used images from UAV, a mobile phone, and the Maize Tassel Counting dataset (MTC) to test the performance of faster region-based convolutional neural network (Faster R-CNN) with residual neural network (ResNet) and a visual geometry group neural network (VGGNet). The results showed that the ResNet, as the feature extraction network, was better than the VGGNet for detecting maize tassels from UAV images with 600 × 600 resolution. The prediction accuracy ranged from 87.94% to 94.99%. However, the prediction accuracy was less than 87.27% from the UAV images with 5280 × 2970 resolution. We modified the anchor size to [852, 1282, 2562] in the region proposal network according to the width and height of pixel distribution to improve detection accuracy up to 89.96%. The accuracy reached up to 95.95% for mobile phone images. Then, we compared our trained model with TasselNet without training their datasets. The average difference of tassel number was 1.4 between the calculations with 40 images for the two methods. In the future, we could further improve the performance of the models by enlarging datasets and calculating other tassel traits such as the length, width, diameter, perimeter, and the branch number of the maize tassels.


Sign in / Sign up

Export Citation Format

Share Document