An Improved Method of Handling Missing Values in the Analysis of Sample Entropy for Continuous Monitoring of Physiological Signals

Xinzheng Dong; Chang Chen; Qingshan Geng; Zhixin Cao; Xiaoyan Chen; Jinxiang Lin; Yu Jin; Zhaozhi Zhang; Yan Shi; Xiaohua Douglas Zhang

doi:10.3390/e21030274

An Improved Method of Handling Missing Values in the Analysis of Sample Entropy for Continuous Monitoring of Physiological Signals

Entropy ◽

10.3390/e21030274 ◽

2019 ◽

Vol 21 (3) ◽

pp. 274 ◽

Cited By ~ 8

Author(s):

Xinzheng Dong ◽

Chang Chen ◽

Qingshan Geng ◽

Zhixin Cao ◽

Xiaoyan Chen ◽

...

Keyword(s):

Time Series ◽

Continuous Time ◽

Missing Values ◽

Time Series Data ◽

Disease Diagnosis ◽

Sample Entropy ◽

Physiological Signals ◽

Series Data ◽

Percentage Error ◽

Average Percentage

Medical devices generate huge amounts of continuous time series data. However, missing values commonly found in these data can prevent us from directly using analytic methods such as sample entropy to reveal the information contained in these data. To minimize the influence of missing points on the calculation of sample entropy, we propose a new method to handle missing values in continuous time series data. We use both experimental and simulated datasets to compare the performance (in percentage error) of our proposed method with three currently used methods: skipping the missing values, linear interpolation, and bootstrapping. Unlike the methods that involve modifying the input data, our method modifies the calculation process. This keeps the data unchanged which is less intrusive to the structure of the data. The results demonstrate that our method has a consistent lower average percentage error than other three commonly used methods in multiple common physiological signals. For missing values in common physiological signal type, different data size and generating mechanism, our method can more accurately extract the information contained in continuously monitored data than traditional methods. So it may serve as an effective tool for handling missing values and may have broad utility in analyzing sample entropy for common physiological signals. This could help develop new tools for disease diagnosis and evaluation of treatment effects.

Download Full-text

Pemodelan Produksi Ayam Ras di Indonesia Menggunakan Regresi dengan Sisaan Deret Waktu

Xplore Journal of Statistics ◽

10.29244/xplore.v8i1.192 ◽

2019 ◽

Vol 8 (1) ◽

Author(s):

Akhbamah Primadaniyah Febrin ◽

Itasia Dina Sulvianti ◽

Aji Hamim Wigena

Keyword(s):

Time Series ◽

Structural Equation ◽

Broiler Chicken ◽

Time Series Data ◽

Retail Price ◽

Series Data ◽

Percentage Error ◽

Average Percentage ◽

Independent Variables ◽

Real Price

The production of broiler chicken has fluctuated in recent years and many factors alleged to influence the production. The purpose of this study is modeling a structural equation of forecasting the production of broiler chicken. The study use a dependent variable (Y) that is production of broiler chickens (kilo ton) and five independent variables (X) consist of broiler chicken population (million), national chicken consumption (ton/year), retail price (Rp/kg), real price of corn (Rp), and real price of Kampung chicken (Rp). The variables are time series data with errors does not spread out randomly. Modeling method used and suitable to the conditions is regression with time series errors combined with ARIMA (Autoregressive Integrated Moving Average). The results of the regression analysis showed that only population variable and retail price variable are influencing the production of broiler chicken in Indonesia. Those two independent variables then modeled by a dependent variable using regression with time series errors. The best modeling is regression with time series errors ARIMA(1,1,0) with MAPE (Mean Average Percentage Error) value of 2.4%, RMSE (Root Mean Square Error) value of 39.800, and correlation value 0.980. The results has proved that the production of broiler chicken in Indonesia is influenced by those two variables.

Download Full-text

PERBANDINGAN INTERPOLASI DAN EKSTRAPOLASI NEWTON UNTUK PREDIKSIDATA TIME SERIES

High Education of Organization Archive Quality: Jurnal Teknologi Informasi ◽

10.52972/hoaq.vol10no2.p73-80 ◽

2018 ◽

Vol 10 (2) ◽

pp. 73-80

Author(s):

Marinus Ignasius Jawawuan Lamabelawa

Keyword(s):

Time Series ◽

Time Series Data ◽

Combination Method ◽

Series Data ◽

Percentage Error ◽

Poor People ◽

Linear Quadratic ◽

Data Set ◽

Average Percentage ◽

Data Point

For numerous purposes, time series data are analyzed to understand phenomena or behaviors of variables, and try to find future value. Interpolation is guessing time series data point between the range of data set. Extrapolation is predict or guessing time series data point from beyond the range of data set. In this study, Newton’s Extrapolation is compared with linear and squared extrapolation. Newton’s Extrapolation making the assumption that the observed trend continues for values of x outside the model range. The robustness of prediction using Root Mean Square Error (RMSE) and Mean Average Percentage Error (MAPE). The results of newton’s interpolation with bottom, middle, and top approaches found the best value are middle approach, namely RMSE 76,01 and MAPE 4,65%. In Newton’s Extrapolation, the error values are consistent at bottom, middle, and top approaches, namely RMSE 541,170 anda MAPE 33,19%. Based on data from the Statistics of Indonesia on the percentage and number of poor people in East Nusa Tenggara Province in 2010 -2018 is declining trend pattern. The error value with Linear, Quadratic, and Newton’s Extrapolation shows the robust value results at linear or trend extrapolation, namely RMSE 157,450 and MAPE 7,93%. These results indicate Newton's extrapolation works well on non-linear data and requires a combination method with soft computing methods such as Fuzzy Systems, AG, or ANN

Download Full-text

Time–frequency time–space LSTM for robust classification of physiological signals

Scientific Reports ◽

10.1038/s41598-021-86432-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tuan D. Pham

Keyword(s):

Time Series ◽

Network Architecture ◽

Time Series Data ◽

Short Term Memory ◽

Physiological Signals ◽

Series Data ◽

Time Frequency ◽

Time Space ◽

Deep Recurrent Neural Network

AbstractAutomated analysis of physiological time series is utilized for many clinical applications in medicine and life sciences. Long short-term memory (LSTM) is a deep recurrent neural network architecture used for classification of time-series data. Here time–frequency and time–space properties of time series are introduced as a robust tool for LSTM processing of long sequential data in physiology. Based on classification results obtained from two databases of sensor-induced physiological signals, the proposed approach has the potential for (1) achieving very high classification accuracy, (2) saving tremendous time for data learning, and (3) being cost-effective and user-comfortable for clinical trials by reducing multiple wearable sensors for data recording.

Download Full-text

Particularities of data mining in medicine: lessons learned from patient medical time series data analysis

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1582-2 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 2

Author(s):

Shadi Aljawarneh ◽

Aurea Anguera ◽

John William Atwood ◽

Juan A. Lara ◽

David Lizcano

Keyword(s):

Data Mining ◽

Time Series ◽

Knowledge Discovery ◽

Time Series Data ◽

Medical Patient ◽

Lessons Learned ◽

Physiological Signals ◽

Knowledge Discovery In Databases ◽

Series Data ◽

Data Mining Techniques

AbstractNowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.

Download Full-text

Multiscale Sample Entropy of Two-Dimensional Decaying Turbulence

Entropy ◽

10.3390/e23020245 ◽

2021 ◽

Vol 23 (2) ◽

pp. 245

Author(s):

Ildoo Kim

Keyword(s):

Time Series ◽

Time Series Data ◽

Soap Film ◽

Sample Entropy ◽

Series Data ◽

Entropy Analysis ◽

Measured Time ◽

Energetic Analysis ◽

Turbulence Data ◽

Decaying Turbulence

Multiscale sample entropy analysis has been developed to quantify the complexity and the predictability of a time series, originally developed for physiological time series. In this study, the analysis was applied to the turbulence data. We measured time series data for the velocity fluctuation, in either the longitudinal or transverse direction, of turbulent soap film flows at various locations. The research was to assess the feasibility of using the entropy analysis to qualitatively characterize turbulence, without using any conventional energetic analysis of turbulence. The study showed that the application of the entropy analysis to the turbulence data is promising. From the analysis, we successfully captured two important features of the turbulent soap films. It is indicated that the turbulence is anisotropic from the directional disparity. In addition, we observed that the most unpredictable time scale increases with the downstream distance, which is an indication of the decaying turbulence.

Download Full-text

Distance variable improvement of time-series big data stream evaluation

Journal Of Big Data ◽

10.1186/s40537-020-00359-w ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Ari Wibisono ◽

Petrus Mursanto ◽

Jihan Adibah ◽

Wendy D. W. T. Bayu ◽

May Iffah Rizki ◽

...

Keyword(s):

Time Series ◽

Standard Deviation ◽

Standard Method ◽

Time Series Data ◽

Series Data ◽

Percentage Error ◽

Traffic Demand ◽

Incremental Model ◽

Chernoff Bound ◽

The Mean

Abstract Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. The standard FIMT-DD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet’s big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMT-DD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method.

Download Full-text

An Empirical Mode-Spatial Model for Environmental Data Imputation

Hydrology ◽

10.3390/hydrology5040063 ◽

2018 ◽

Vol 5 (4) ◽

pp. 63 ◽

Cited By ~ 1

Author(s):

Benjamin Nelsen ◽

D. Williams ◽

Gustavious Williams ◽

Candace Berrett

Keyword(s):

Time Series ◽

Spatial Data ◽

Missing Values ◽

Time Series Data ◽

Environmental Data ◽

Series Data ◽

Data Imputation ◽

Accurate Data ◽

Target Station ◽

Periodic Components

Complete and accurate data are necessary for analyzing and understanding trends in time-series datasets; however, many of the available time-series datasets have gaps that affect the analysis, especially in the earth sciences. As most available data have missing values, researchers use various interpolation methods or ad hoc approaches to data imputation. Since the analysis based on inaccurate data can lead to inaccurate conclusions, more accurate data imputation methods can provide accurate analysis. We present a spatial-temporal data imputation method using Empirical Mode Decomposition (EMD) based on spatial correlations. We call this method EMD-spatial data imputation or EMD-SDI. Though this method is applicable to other time-series data sets, here we demonstrate the method using temperature data. The EMD algorithm decomposes data into periodic components called intrinsic mode functions (IMF) and exactly reconstructs the original signal by summing these IMFs. EMD-SDI initially decomposes the data from the target station and other stations in the region into IMFs. EMD-SDI evaluates each IMF from the target station in turn and selects the IMF from other stations in the region with periodic behavior most correlated to target IMF. EMD-SDI then replaces a section of missing data in the target station IMF with the section from the most closely correlated IMF from the regional stations. We found that EMD-SDI selects the IMFs used for reconstruction from different stations throughout the region, not necessarily the station closest in the geographic sense. EMD-SDI accurately filled data gaps from 3 months to 5 years in length in our tests and favorably compares to a simple temporal method. EMD-SDI leverages regional correlation and the fact that different stations can be subject to different periodic behaviors. In addition to data imputation, the EMD-SDI method provides IMFs that can be used to better understand regional correlations and processes.

Download Full-text

An improved framework to predict river flow time series data

PeerJ ◽

10.7717/peerj.7183 ◽

2019 ◽

Vol 7 ◽

pp. e7183 ◽

Cited By ~ 1

Author(s):

Hafiza Mamona Nazir ◽

Ijaz Hussain ◽

Ishfaq Ahmad ◽

Muhammad Faisal ◽

Ibrahim M. Almanjahie

Keyword(s):

Time Series ◽

Water Resource Management ◽

River Flow ◽

Time Series Data ◽

Flow Time ◽

Indus Basin ◽

Series Data ◽

Percentage Error ◽

Empirical Bayesian ◽

Noise Characteristics

Due to non-stationary and noise characteristics of river flow time series data, some pre-processing methods are adopted to address the multi-scale and noise complexity. In this paper, we proposed an improved framework comprising Complete Ensemble Empirical Mode Decomposition with Adaptive Noise-Empirical Bayesian Threshold (CEEMDAN-EBT). The CEEMDAN-EBT is employed to decompose non-stationary river flow time series data into Intrinsic Mode Functions (IMFs). The derived IMFs are divided into two parts; noise-dominant IMFs and noise-free IMFs. Firstly, the noise-dominant IMFs are denoised using empirical Bayesian threshold to integrate the noises and sparsities of IMFs. Secondly, the denoised IMF’s and noise free IMF’s are further used as inputs in data-driven and simple stochastic models respectively to predict the river flow time series data. Finally, the predicted IMF’s are aggregated to get the final prediction. The proposed framework is illustrated by using four rivers of the Indus Basin System. The prediction performance is compared with Mean Square Error, Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). Our proposed method, CEEMDAN-EBT-MM, produced the smallest MAPE for all four case studies as compared with other methods. This suggests that our proposed hybrid model can be used as an efficient tool for providing the reliable prediction of non-stationary and noisy time series data to policymakers such as for planning power generation and water resource management.

Download Full-text

Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis

Fluctuation and Noise Letters ◽

10.1142/s0219477518500177 ◽

2018 ◽

Vol 17 (02) ◽

pp. 1850017 ◽

Cited By ~ 3

Author(s):

Mahdi Kalantari ◽

Masoud Yarmohammadi ◽

Hossein Hassani ◽

Emmanuel Sirimal Silva

Keyword(s):

Time Series ◽

Spectrum Analysis ◽

Missing Values ◽

Time Series Data ◽

Singular Spectrum Analysis ◽

Series Data ◽

L1 Norm ◽

Nonparametric Approach ◽

Singular Spectrum ◽

Simulated Time

Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the [Formula: see text] norm-based version of Singular Spectrum Analysis (SSA), namely [Formula: see text]-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially [Formula: see text]-SSA can provide better imputation in comparison to other methods.

Download Full-text

Filling Missing Values on Wearable-Sensory Time Series Data

Proceedings of the 2020 SIAM International Conference on Data Mining ◽

10.1137/1.9781611976236.6 ◽

2020 ◽

pp. 46-54

Author(s):

Suwen Lin ◽

Xian Wu ◽

Gonzalo Martinez ◽

Nitesh V. Chawla

Keyword(s):

Time Series ◽

Missing Values ◽

Time Series Data ◽

Series Data

Download Full-text