scholarly journals Driver Drowsiness Estimation Based on Factorized Bilinear Feature Fusion and a Long-Short-Term Recurrent Convolutional Network

Information ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 3
Author(s):  
Shuang Chen ◽  
Zengcai Wang ◽  
Wenxin Chen

The effective detection of driver drowsiness is an important measure to prevent traffic accidents. Most existing drowsiness detection methods only use a single facial feature to identify fatigue status, ignoring the complex correlation between fatigue features and the time information of fatigue features, and this reduces the recognition accuracy. To solve these problems, we propose a driver sleepiness estimation model based on factorized bilinear feature fusion and a long- short-term recurrent convolutional network to detect driver drowsiness efficiently and accurately. The proposed framework includes three models: fatigue feature extraction, fatigue feature fusion, and driver drowsiness detection. First, we used a convolutional neural network (CNN) to effectively extract the deep representation of eye and mouth-related fatigue features from the face area detected in each video frame. Then, based on the factorized bilinear feature fusion model, we performed a nonlinear fusion of the deep feature representations of the eyes and mouth. Finally, we input a series of fused frame-level features into a long-short-term memory (LSTM) unit to obtain the time information of the features and used the softmax classifier to detect sleepiness. The proposed framework was evaluated with the National Tsing Hua University drowsy driver detection (NTHU-DDD) video dataset. The experimental results showed that this method had better stability and robustness compared with other methods.

2020 ◽  
Vol 18 (S3) ◽  
pp. 34-45
Author(s):  
Zhingtang Zhao ◽  
Qingtao Wu

In intelligent computer-aided video abnormal behavior recognition, pedestrian behavior analysis technology can detect and handle abnormal behaviors in time, which has great practical value in ensuring social safety. We analyze a deep learning video behavior recognition network that has advantages in current research. The network first sparsely sampled the input video to obtain the video frame of each video segment, and then used a two-dimensional convolutional network to extract the characteristics of each video frame, then used a three-dimensional network to fuse them. The method realizes the recognition of long-term and short-term actions in the video at the same time. In order to overcome the shortcoming of the large amount of calculation in the 3D convolution part of the network, this paper proposes an improvement to this module in the network, and proposes a mobile 3D convolution network structure. Aiming at the problem of low utilization of long-term motion features in video sequences, this paper constructs a deep residual module by introducing long and short-term memory networks, residual connection design, etc., to fully and effectively utilize the long-term dynamic features in video sequences. Aiming at the problem of large differences in similar actions and small differences between classes in abnormal behavior videos, this paper proposes a 2CSoftmax function based on double center loss to optimize the network model, which is beneficial to maximize the distance between classes and minimize the distance between classes, so as to realize the classification and recognition of similar actions and improve the recognition accuracy.


2021 ◽  
pp. 1-10
Author(s):  
Xiaojun Chen ◽  
Shengbin Jia ◽  
Ling Ding ◽  
Yang Xiang

Knowledge graph reasoning or completion aims at inferring missing facts by reasoning about the information already present in the knowledge graph. In this work, we explore the problem of temporal knowledge graph reasoning that performs inference on the graph over time. Most existing reasoning models ignore the time information when learning entities and relations representations. For example, the fact (Scarlett Johansson, spouse Of, Ryan Reynolds) was true only during 2008 - 2011. To facilitate temporal reasoning, we present TA-TransRILP, which involves temporal information by utilizing RNNs and takes advantage of Integer Linear Programming. Specifically, we utilize a character-level long short-term memory network to encode relations with sequences of temporal tokens, and combine it with common reasoning model. To achieve more accurate reasoning, we further deploy temporal consistency constraints to basic model, which can help in assessing the validity of a fact better. We conduct entity prediction and relation prediction on YAGO11k and Wikidata12k datasets. Experimental results demonstrate that TA-TransRILP can make more accurate predictions by taking time information and temporal consistency constraints into account, and outperforms existing methods with a significant improvement about 6-8% on Hits@10.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5037
Author(s):  
Hisham ElMoaqet ◽  
Mohammad Eid ◽  
Martin Glos ◽  
Mutaz Ryalat ◽  
Thomas Penzel

Sleep apnea is a common sleep disorder that causes repeated breathing interruption during sleep. The performance of automated apnea detection methods based on respiratory signals depend on the signals considered and feature extraction methods. Moreover, feature engineering techniques are highly dependent on the experts’ experience and their prior knowledge about different physiological signals and conditions of the subjects. To overcome these problems, a novel deep recurrent neural network (RNN) framework is developed for automated feature extraction and detection of apnea events from single respiratory channel inputs. Long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) are investigated to develop the proposed deep RNN model. The proposed framework is evaluated over three respiration signals: Oronasal thermal airflow (FlowTh), nasal pressure (NPRE), and abdominal respiratory inductance plethysmography (ABD). To demonstrate our results, we use polysomnography (PSG) data of 17 patients with obstructive, central, and mixed apnea events. Our results indicate the effectiveness of the proposed framework in automatic extraction for temporal features and automated detection of apneic events over the different respiratory signals considered in this study. Using a deep BiLSTM-based detection model, the NPRE signal achieved the highest overall detection results with true positive rate (sensitivity) = 90.3%, true negative rate (specificity) = 83.7%, and area under receiver operator characteristic curve = 92.4%. The present results contribute a new deep learning approach for automated detection of sleep apnea events from single channel respiration signals that can potentially serve as a helpful and alternative tool for the traditional PSG method.


Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 89
Author(s):  
Yang Gao ◽  
Yawu Zhao ◽  
Yuming Ma ◽  
Yihui Liu

Protein secondary structure prediction is an important topic in bioinformatics. This paper proposed a novel model named WS-BiLSTM, which combined the wavelet scattering convolutional network and the long-short-term memory network for the first time to predict protein secondary structure. This model captures nonlocal interactions between amino acid sequences and remembers long-range interactions between amino acids. In our WS-BiLSTM model, the wavelet scattering convolutional network is used to extract protein features from the PSSM sliding window; the extracted features are combined with the original PSSM data as the input features of the long-short-term memory network to predict protein secondary structure. It is worth noting that the wavelet scattering convolutional network is asymmetric as a member of the continuous wavelet family. The Q3 accuracy on the test set CASP9, CASP10, CASP11, CASP12, CB513, and PDB25 reached 85.26%, 85.84%, 84.91%, 85.13%, 86.10%, and 85.52%, which were higher 2.15%, 2.16%, 3.5%, 3.19%, 4.22%, and 2.75%, respectively, than using the long-short-term memory network alone. Comparing our results with the state-of-art methods shows that our proposed model achieved better results on the CB513 and CASP12 data sets. The experimental results show that the features extracted from the wavelet scattering convolutional network can effectively improve the accuracy of protein secondary structure prediction.


2019 ◽  
Vol 11 (2) ◽  
pp. 42 ◽  
Author(s):  
Sheeraz Arif ◽  
Jing Wang ◽  
Tehseen Ul Hassan ◽  
Zesong Fei

Human activity recognition is an active field of research in computer vision with numerous applications. Recently, deep convolutional networks and recurrent neural networks (RNN) have received increasing attention in multimedia studies, and have yielded state-of-the-art results. In this research work, we propose a new framework which intelligently combines 3D-CNN and LSTM networks. First, we integrate discriminative information from a video into a map called a ‘motion map’ by using a deep 3-dimensional convolutional network (C3D). A motion map and the next video frame can be integrated into a new motion map, and this technique can be trained by increasing the training video length iteratively; then, the final acquired network can be used for generating the motion map of the whole video. Next, a linear weighted fusion scheme is used to fuse the network feature maps into spatio-temporal features. Finally, we use a Long-Short-Term-Memory (LSTM) encoder-decoder for final predictions. This method is simple to implement and retains discriminative and dynamic information. The improved results on benchmark public datasets prove the effectiveness and practicability of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document