Sequence-to-Sequence Video Prediction by Learning Hierarchical Representations

Kun Fan; Chungin Joung; Seungjun Baek

doi:10.3390/app10228288

Sequence-to-Sequence Video Prediction by Learning Hierarchical Representations

Applied Sciences ◽

10.3390/app10228288 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8288

Author(s):

Kun Fan ◽

Chungin Joung ◽

Seungjun Baek

Keyword(s):

Short Term Memory ◽

Temporal Dynamics ◽

Human Action ◽

Recurrent Network ◽

Video Frames ◽

High Quality Sequence ◽

Prediction Approach ◽

Hierarchical Representations ◽

Video Prediction ◽

Previous Prediction

Video prediction which maps a sequence of past video frames into realistic future video frames is a challenging task because it is difficult to generate realistic frames and model the coherent relationship between consecutive video frames. In this paper, we propose a hierarchical sequence-to-sequence prediction approach to address this challenge. We present an end-to-end trainable architecture in which the frame generator automatically encodes input frames into different levels of latent Convolutional Neural Network (CNN) features, and then recursively generates future frames conditioned on the estimated hierarchical CNN features and previous prediction. Our design is intended to automatically learn hierarchical representations of video and their temporal dynamics. Convolutional Long Short-Term Memory (ConvLSTM) is used in combination with skip connections so as to separately capture the sequential structures of multiple levels of hierarchy of features. We adopt Scheduled Sampling for training our recurrent network in order to facilitate convergence and to produce high-quality sequence predictions. We evaluate our method on the Bouncing Balls, Moving MNIST, and KTH human action dataset, and report favorable results as compared to existing methods.

Download Full-text

MDTP

Proceedings of the VLDB Endowment ◽

10.14778/3457390.3457394 ◽

2021 ◽

Vol 14 (8) ◽

pp. 1289-1297

Author(s):

Ziquan Fang ◽

Lu Pan ◽

Lu Chen ◽

Yuntao Du ◽

Yunjun Gao

Keyword(s):

Neural Network ◽

Short Term Memory ◽

Temporal Dynamics ◽

Real Life ◽

Feature Modeling ◽

Traffic Prediction ◽

Interactive System ◽

Trajectory Data ◽

Spatio Temporal ◽

Prediction Approach

Traffic prediction has drawn increasing attention for its ubiquitous real-life applications in traffic management, urban computing, public safety, and so on. Recently, the availability of massive trajectory data and the success of deep learning motivate a plethora of deep traffic prediction studies. However, the existing neural-network-based approaches tend to ignore the correlations between multiple types of moving objects located in the same spatio-temporal traffic area, which is suboptimal for traffic prediction analytics. In this paper, we propose a multi-source deep traffic prediction framework over spatio-temporal trajectory data, termed as MDTP. The framework includes two phases: spatio-temporal feature modeling and multi-source bridging. We present an enhanced graph convolutional network (GCN) model combined with long short-term memory network (LSTM) to capture the spatial dependencies and temporal dynamics of traffic in the feature modeling phase. In the multi-source bridging phase, we propose two methods, Sum and Concat, to connect the learned features from different trajectory data sources. Extensive experiments on two real-life datasets show that MDTP i) has superior efficiency, compared with classical time-series methods, machine learning methods, and state-of-the-art neural-network-based approaches; ii) offers a significant performance improvement over the single-source traffic prediction approach; and iii) performs traffic predictions in seconds even on tens of millions of trajectory data. we develop MDTP + , a user-friendly interactive system to demonstrate traffic prediction analysis.

Download Full-text

Global soil moisture data derived through machine learning trained with in-situ measurements

Scientific Data ◽

10.1038/s41597-021-00964-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Sungmin O. ◽

Rene Orth

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Large Scale ◽

Short Term Memory ◽

Temporal Dynamics ◽

Soil Moisture Data ◽

Wide Range ◽

Global Soil

AbstractWhile soil moisture information is essential for a wide range of hydrologic and climate applications, spatially-continuous soil moisture data is only available from satellite observations or model simulations. Here we present a global, long-term dataset of soil moisture derived through machine learning trained with in-situ measurements, SoMo.ml. We train a Long Short-Term Memory (LSTM) model to extrapolate daily soil moisture dynamics in space and in time, based on in-situ data collected from more than 1,000 stations across the globe. SoMo.ml provides multi-layer soil moisture data (0–10 cm, 10–30 cm, and 30–50 cm) at 0.25° spatial and daily temporal resolution over the period 2000–2019. The performance of the resulting dataset is evaluated through cross validation and inter-comparison with existing soil moisture datasets. SoMo.ml performs especially well in terms of temporal dynamics, making it particularly useful for applications requiring time-varying soil moisture, such as anomaly detection and memory analyses. SoMo.ml complements the existing suite of modelled and satellite-based datasets given its distinct derivation, to support large-scale hydrological, meteorological, and ecological analyses.

Download Full-text

Structured Stochastic Recurrent Network for Linguistic Video Prediction

Proceedings of the 27th ACM International Conference on Multimedia ◽

10.1145/3343031.3350859 ◽

2019 ◽

Cited By ~ 1

Author(s):

Shijie Yang ◽

Liang Li ◽

Shuhui Wang ◽

Dechao Meng ◽

Qingming Huang ◽

...

Keyword(s):

Recurrent Network ◽

Video Prediction

Download Full-text

Two Stage Continuous Gesture Recognition Based on Deep Learning

Electronics ◽

10.3390/electronics10050534 ◽

2021 ◽

Vol 10 (5) ◽

pp. 534

Author(s):

Huogen Wang

Keyword(s):

Gesture Recognition ◽

Large Scale ◽

Short Term Memory ◽

Short Term ◽

Hand Motion ◽

Spatiotemporal Features ◽

Spatiotemporal Information ◽

Video Frames ◽

Depth Sequences

The paper proposes an effective continuous gesture recognition method, which includes two modules: segmentation and recognition. In the segmentation module, the video frames are divided into gesture frames and transitional frames by using the information of hand motion and appearance, and continuous gesture sequences are segmented into isolated sequences. In the recognition module, our method exploits the spatiotemporal information embedded in RGB and depth sequences. For the RGB modality, our method adopts Convolutional Long Short-Term Memory Networks to learn long-term spatiotemporal features from short-term spatiotemporal features obtained from a 3D convolutional neural network. For the depth modality, our method converts a sequence into Dynamic Images and Motion Dynamic Images through weighted rank pooling and feed them into Convolutional Neural Networks, respectively. Our method has been evaluated on both ChaLearn LAP Large-scale Continuous Gesture Dataset and Montalbano Gesture Dataset and achieved state-of-the-art performance.

Download Full-text

Forecasting the risk at infractions: an ensemble comparison of machine learning approach

Industrial Management & Data Systems ◽

10.1108/imds-10-2020-0603 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lei Li ◽

Desheng Wu

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Short Term Memory ◽

Model Performance ◽

Large Data ◽

Support Vector ◽

Learning Approaches ◽

Content Type ◽

Day To Day Operations ◽

Prediction Approach

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.

Download Full-text

Classification of Action Based Video using Heterogeneous Feature Extraction and SVM

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2089.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1887-1892

Keyword(s):

Optical Flow ◽

Video Sequence ◽

Human Action ◽

Video Data ◽

Support Vector ◽

Svm Classifier ◽

Video Frames ◽

Integral Role ◽

Heterogeneous Feature

Action recognition (AR) plays a fundamental role in computer vision and video analysis. We are witnessing an astronomical increase of video data on the web and it is difficult to recognize the action in video due to different view point of camera. For AR in video sequence, it depends upon appearance in frame and optical flow in frames of video. In video spatial and temporal components of video frames features play integral role for better classification of action in videos. In the proposed system, RGB frames and optical flow frames are used for AR with the help of Convolutional Neural Network (CNN) pre-trained model Alex-Net extract features from fc7 layer. Support vector machine (SVM) classifier is used for the classification of AR in videos. For classification purpose, HMDB51 dataset have been used which includes 51 Classes of human action. The dataset is divided into 51 action categories. Using SVM classifier, extracted features are used for classification and achieved best result 95.6% accuracy as compared to other techniques of the state-of- art.v

Download Full-text

A video prediction approach for animating single face image

Multimedia Tools and Applications ◽

10.1007/s11042-018-6952-y ◽

2018 ◽

Vol 78 (12) ◽

pp. 16389-16410 ◽

Cited By ~ 2

Author(s):

Yong Zhao ◽

Meshia Cédric Oveneke ◽

Dongmei Jiang ◽

Hichem Sahli

Keyword(s):

Face Image ◽

Single Face ◽

Prediction Approach ◽

Video Prediction

Download Full-text

Developing Deep Survival Model for Remaining Useful Life Estimation Based on Convolutional and Long Short-Term Memory Neural Networks

Wireless Communications and Mobile Computing ◽

10.1155/2020/8814658 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Chia-Hua Chu ◽

Chia-Jung Lee ◽

Hsiang-Yuan Yeh

Keyword(s):

Neural Networks ◽

Short Term Memory ◽

Survival Model ◽

Remaining Useful Life ◽

Superior Performance ◽

Short Term ◽

Term Memory ◽

Useful Life ◽

Long Short Term Memory ◽

Prediction Approach

The application of mechanical equipment in manufacturing is becoming more and more complicated with technology development and adoption. In order to keep the high reliability and stability of the production line, reducing the downtime to repair and the frequency of routine maintenance is necessary. Since machine and components’ degradations are inevitable, accurately estimating the remaining useful life of them is crucial. We propose an integrated deep learning approach with convolutional neural networks and long short-term memory networks to learn the latent features and estimate remaining useful life value with deep survival model based on the discrete Weibull distribution. We conduct the turbofan engine degradation simulation dataset from Commercial Modular Aero-Propulsion System Simulation dataset provided by NASA to validate our approach. The improved results have proven that our proposed model can capture the degradation trend of a fault and has superior performance under complex conditions compared with existing state-of-the-art methods. Our study provides an efficient feature extraction scheme and offers a promising prediction approach to make better maintenance strategies.

Download Full-text

Remaining Useful Life Estimation Using Deep Convolutional Generative Adversarial Networks Based on an Autoencoder Scheme

Computational Intelligence and Neuroscience ◽

10.1155/2020/9601389 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Guisheng Hou ◽

Shuo Xu ◽

Nan Zhou ◽

Lei Yang ◽

Quanhao Fu

Keyword(s):

Feature Extraction ◽

Short Term Memory ◽

Health Management ◽

Remaining Useful Life ◽

Fine Tuning ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Useful Life ◽

Prediction Approach

Accurate predictions of remaining useful life (RUL) of important components play a crucial role in system reliability, which is the basis of prognostics and health management (PHM). This paper proposed an integrated deep learning approach for RUL prediction of a turbofan engine by integrating an autoencoder (AE) with a deep convolutional generative adversarial network (DCGAN). In the pretraining stage, the reconstructed data of the AE not only participate in its error reconstruction but also take part in the DCGAN parameter training as the generated data of the DCGAN. Through double-error reconstructions, the capability of feature extraction is enhanced, and high-level abstract information is obtained. In the fine-tuning stage, a long short-term memory (LSTM) network is used to extract the sequential information from the features to predict the RUL. The effectiveness of the proposed scheme is verified on the NASA commercial modular aero-propulsion system simulation (C-MAPSS) dataset. The superiority of the proposed method is demonstrated via excellent prediction performance and comparisons with other existing state-of-the-art prognostics. The results of this study suggest that the proposed data-driven prognostic method offers a new and promising prediction approach and an efficient feature extraction scheme.

Download Full-text

Neural Networks as Classification Mechanisms of Complex Human Activities

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213020500116 ◽

2020 ◽

Vol 29 (05) ◽

pp. 2050011

Author(s):

Anargyros Angeleas ◽

Nikolaos Bourbakis

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Short Term Memory ◽

Neural Nets ◽

Quality Of Data ◽

Short Term ◽

Video Frames ◽

Data Engineering ◽

Formal Framework

Within this paper, we present two neural nets for view-independent complex human activity recognition (HAR) from video frames. For our study here, we reduce the number of frames produced by a video sequence given that we can identify activities from a sparsely sampled sequence of body poses, and, at the same time, we are able to reduce the processing complexity and response while hardly affecting the accuracy, precision, and recall. To do so, we use a formal framework to ensure the quality of data collection and data preprocessing. We utilize neural networks for the classification of single and complex body activities. More specifically, we consider the sequence of body poses as a time-series problem given that they can provide state-of-the-art results on challenging recognition tasks with little data engineering. Deep Learning in the form of Convolutional Neural Network (CNN), Long Short-Term Neural Network (LSTM), and a one-dimensional Convolutional Neural Network Long Short-Term Memory model (CNN-LSTM) are used as benchmarks to classify the activity.

Download Full-text