scholarly journals Temporal Dynamic Matrix Factorization for Missing Data Prediction in Large Scale Coevolving Time Series

IEEE Access ◽  
2016 ◽  
Vol 4 ◽  
pp. 6719-6732 ◽  
Author(s):  
Weiwei Shi ◽  
Yongxin Zhu ◽  
Philip S. Yu ◽  
Tian Huang ◽  
Chang Wang ◽  
...  
Computing ◽  
2018 ◽  
Vol 101 (11) ◽  
pp. 1565-1584
Author(s):  
Xiaoxiang Song ◽  
Yan Guo ◽  
Ning Li ◽  
Peng Qian

2019 ◽  
Vol 15 (1) ◽  
pp. 13-17
Author(s):  
Nurul Latiffah Abd Rani ◽  
Azman Azid ◽  
Muhamad Shirwan Abdullah Sani ◽  
Mohd Saiful Samsudin ◽  
Ku Mohd Kalkausar Ku Yusof ◽  
...  

Carbon monoxide (CO) is one of the most important pollutants since it is selected for API calculation. Therefore, it is paramount to ensure that there is no missing data of CO during the analysis. There are numbers of occurrences that may contribute to the missing data problems such as inability of the instrument to record certain parameters. In view of this fact, a CO prediction model needs to be developed to address this problem. A dataset of meteorological and air pollutants value was obtained from the Air Quality Division, Department of Environment Malaysia (DOE). A total of 113112 datasets were used to develop the model using sensitivity analysis (SA) through artificial neural network (ANN). SA showed particulate matter (PM10) and ozone (O3) were the most significant input variables for missing data prediction model of CO. Three hidden nodes were the optimum number to develop the ANN model with the value of R2 equal to 0.5311. Both models (artificial neural network-carbon monoxide-all parameters (ANN-CO-AP) and artificial neural network-carbon monoxide-leave out (ANN-CO-LO)) showed high value of R2 (0.7639 and 0.5311) and low value of RMSE (0.2482 and 0.3506), respectively. These values indicated that the models might only employ the most significant input variables to represent the CO rather than using all input variables.


Processes ◽  
2019 ◽  
Vol 7 (5) ◽  
pp. 265 ◽  
Author(s):  
Mingrui Sun ◽  
Tengfei Min ◽  
Tianyi Zang ◽  
Yadong Wang

(1) Background: Recommendation algorithms have played a vital role in the prediction of personalized recommendation for clinical decision support systems (CDSSs). Machine learning methods are powerful tools for disease diagnosis. Unfortunately, they must deal with missing data, as this will result in data error and limit the potential patterns and features associated with obtaining a clinical decision; (2) Methods: Recent years, collaborative filtering (CF) have proven to be a valuable means of coping with missing data prediction. In order to address the challenge of missing data prediction and latent feature extraction, neighbor-based and latent features-based CF methods are presented for clinical disease diagnosis. The novel discriminative restricted Boltzmann machine (DRBM) model is proposed to extract the latent features, where the deep learning technique is adopted to analyze the clinical data; (3) Results: Proposed methods were compared to machine learning models, using two different publicly available clinical datasets, which has various types of inputs and different quantity of missing. We also evaluated the performance of our algorithm, using clinical datasets that were missing at random (MAR), which were missing at various degrees; and (4) Conclusions: The experimental results demonstrate that DRBM can effectively capture the latent features of real clinical data and exhibits excellent performance for predicting missing values and result classification.


2017 ◽  
Vol 10 (2) ◽  
pp. 145-165 ◽  
Author(s):  
Kehe Wu ◽  
Yayun Zhu ◽  
Quan Li ◽  
Ziwei Wu

Purpose The purpose of this paper is to propose a data prediction framework for scenarios which require forecasting demand for large-scale data sources, e.g., sensor networks, securities exchange, electric power secondary system, etc. Concretely, the proposed framework should handle several difficult requirements including the management of gigantic data sources, the need for a fast self-adaptive algorithm, the relatively accurate prediction of multiple time series, and the real-time demand. Design/methodology/approach First, the autoregressive integrated moving average-based prediction algorithm is introduced. Second, the processing framework is designed, which includes a time-series data storage model based on the HBase, and a real-time distributed prediction platform based on Storm. Then, the work principle of this platform is described. Finally, a proof-of-concept testbed is illustrated to verify the proposed framework. Findings Several tests based on Power Grid monitoring data are provided for the proposed framework. The experimental results indicate that prediction data are basically consistent with actual data, processing efficiency is relatively high, and resources consumption is reasonable. Originality/value This paper provides a distributed real-time data prediction framework for large-scale time-series data, which can exactly achieve the requirement of the effective management, prediction efficiency, accuracy, and high concurrency for massive data sources.


Sign in / Sign up

Export Citation Format

Share Document