Evaluating the Performance of Multiple Imputation Methods for Handling Missing Values in Time Series Data: A Study Focused on East Africa, Soil-Carbonate-Stable Isotope Data

Hossein Hassani; Mahdi Kalantari; Zara Ghodsi

doi:10.3390/stats2040032

Evaluating the Performance of Multiple Imputation Methods for Handling Missing Values in Time Series Data: A Study Focused on East Africa, Soil-Carbonate-Stable Isotope Data

Stats ◽

10.3390/stats2040032 ◽

2019 ◽

Vol 2 (4) ◽

pp. 457-467 ◽

Cited By ~ 1

Author(s):

Hossein Hassani ◽

Mahdi Kalantari ◽

Zara Ghodsi

Keyword(s):

Time Series ◽

East Africa ◽

Quantitative Research ◽

Missing Values ◽

Time Series Data ◽

Extreme Values ◽

Singular Spectrum Analysis ◽

Series Data ◽

Data Set ◽

Imputation Methods

In all fields of quantitative research, analysing data with missing values is an excruciating challenge. It should be no surprise that given the fragmentary nature of fossil records, the presence of missing values in geographical databases is unavoidable. As in such studies ignoring missing values may result in biased estimations or invalid conclusions, adopting a reliable imputation method should be regarded as an essential consideration. In this study, the performance of singular spectrum analysis (SSA) based on L 1 norm was evaluated on the compiled δ 13 C data from East Africa soil carbonates, which is a world targeted historical geology data set. Results were compared with ten traditionally well-known imputation methods showing L 1 -SSA performs well in keeping the variability of the time series and providing estimations which are less affected by extreme values, suggesting the method introduced here deserves further consideration in practice.

Download Full-text

Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis

Fluctuation and Noise Letters ◽

10.1142/s0219477518500177 ◽

2018 ◽

Vol 17 (02) ◽

pp. 1850017 ◽

Cited By ~ 3

Author(s):

Mahdi Kalantari ◽

Masoud Yarmohammadi ◽

Hossein Hassani ◽

Emmanuel Sirimal Silva

Keyword(s):

Time Series ◽

Spectrum Analysis ◽

Missing Values ◽

Time Series Data ◽

Singular Spectrum Analysis ◽

Series Data ◽

L1 Norm ◽

Nonparametric Approach ◽

Singular Spectrum ◽

Simulated Time

Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the [Formula: see text] norm-based version of Singular Spectrum Analysis (SSA), namely [Formula: see text]-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially [Formula: see text]-SSA can provide better imputation in comparison to other methods.

Download Full-text

The Comparison of Imputation Methods in Space Time Series Data with Missing Values

Communications for Statistical Applications and Methods ◽

10.5351/ckss.2010.17.2.263 ◽

2010 ◽

Vol 17 (2) ◽

pp. 263-273 ◽

Cited By ~ 2

Author(s):

Sung-Duck Lee ◽

Duck-Ki Kim

Keyword(s):

Time Series ◽

Missing Values ◽

Time Series Data ◽

Space Time ◽

Series Data ◽

Imputation Methods

Download Full-text

The Comparison of Imputation Methods in Time Series Data with Missing Values

Communications for Statistical Applications and Methods ◽

10.5351/ckss.2009.16.4.723 ◽

2009 ◽

Vol 16 (4) ◽

pp. 723-730

Author(s):

Sung-Duck Lee ◽

Jae-Hyuk Choi ◽

Duck-Ki Kim

Keyword(s):

Time Series ◽

Missing Values ◽

Time Series Data ◽

Series Data ◽

Imputation Methods

Download Full-text

Time Series Components Separation Based on Singular Spectral Analysis Visualization: an HJ-biplot Method Application

Statistics Optimization & Information Computing ◽

10.19139/soic-2310-5070-897 ◽

2020 ◽

Vol 8 (2) ◽

pp. 346-358

Author(s):

Alberto Oliveira da Silva ◽

Adelaide Freitas

Keyword(s):

Time Series ◽

Time Series Data ◽

Singular Spectrum Analysis ◽

Principal Component ◽

Series Data ◽

Real World Data ◽

Components Separation ◽

Data Set ◽

The Matrix ◽

Simultaneous Representation

The extraction of essential features of any real-valued time series is crucial for exploring, modeling and producing, for example, forecasts. Taking advantage of the representation of a time series data by its trajectory matrix of Hankel constructed using Singular Spectrum Analysis, as well as of its decomposition through Principal Component Analysis via Partial Least Squares, we implement a graphical display employing the biplot methodology. A diversity of types of biplots can be constructed depending on the two matrices considered in the factorization of the trajectory matrix. In this work, we discuss the called HJ-biplot which yields a simultaneous representation of both rows and columns of the matrix with maximum quality. Interpretation of this type of biplot on Hankel related trajectory matrices is discussed from a real-world data set.

Download Full-text

Remaining Useful Life Prediction Using Temporal Convolution with Attention

AI ◽

10.3390/ai2010005 ◽

2021 ◽

Vol 2 (1) ◽

pp. 48-70

Author(s):

Wei Ming Tan ◽

T. Hui Teo

Keyword(s):

Neural Network ◽

Time Series ◽

Time Series Data ◽

Remaining Useful Life ◽

Sensor Data ◽

Series Data ◽

Multiple Time ◽

Data Set ◽

Form Complex ◽

Useful Life

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form complex and non-linear inter-dependencies through recorded time steps and between sensors. Many current existing algorithms for prognostic purposes starts to explore Deep Neural Network (DNN) and its effectiveness in the field. Although Deep Learning (DL) techniques outperform the traditional prognostic algorithms, the networks are generally complex to deploy or train. This paper proposes a Multi-variable Time Series (MTS) focused approach to prognostics that implements a lightweight Convolutional Neural Network (CNN) with attention mechanism. The convolution filters work to extract the abstract temporal patterns from the multiple time series, while the attention mechanisms review the information across the time axis and select the relevant information. The results suggest that the proposed method not only produces a superior accuracy of RUL estimation but it also trains many folds faster than the reported works. The superiority of deploying the network is also demonstrated on a lightweight hardware platform by not just being much compact, but also more efficient for the resource restricted environment.

Download Full-text

An Empirical Mode-Spatial Model for Environmental Data Imputation

Hydrology ◽

10.3390/hydrology5040063 ◽

2018 ◽

Vol 5 (4) ◽

pp. 63 ◽

Cited By ~ 1

Author(s):

Benjamin Nelsen ◽

D. Williams ◽

Gustavious Williams ◽

Candace Berrett

Keyword(s):

Time Series ◽

Spatial Data ◽

Missing Values ◽

Time Series Data ◽

Environmental Data ◽

Series Data ◽

Data Imputation ◽

Accurate Data ◽

Target Station ◽

Periodic Components

Complete and accurate data are necessary for analyzing and understanding trends in time-series datasets; however, many of the available time-series datasets have gaps that affect the analysis, especially in the earth sciences. As most available data have missing values, researchers use various interpolation methods or ad hoc approaches to data imputation. Since the analysis based on inaccurate data can lead to inaccurate conclusions, more accurate data imputation methods can provide accurate analysis. We present a spatial-temporal data imputation method using Empirical Mode Decomposition (EMD) based on spatial correlations. We call this method EMD-spatial data imputation or EMD-SDI. Though this method is applicable to other time-series data sets, here we demonstrate the method using temperature data. The EMD algorithm decomposes data into periodic components called intrinsic mode functions (IMF) and exactly reconstructs the original signal by summing these IMFs. EMD-SDI initially decomposes the data from the target station and other stations in the region into IMFs. EMD-SDI evaluates each IMF from the target station in turn and selects the IMF from other stations in the region with periodic behavior most correlated to target IMF. EMD-SDI then replaces a section of missing data in the target station IMF with the section from the most closely correlated IMF from the regional stations. We found that EMD-SDI selects the IMFs used for reconstruction from different stations throughout the region, not necessarily the station closest in the geographic sense. EMD-SDI accurately filled data gaps from 3 months to 5 years in length in our tests and favorably compares to a simple temporal method. EMD-SDI leverages regional correlation and the fact that different stations can be subject to different periodic behaviors. In addition to data imputation, the EMD-SDI method provides IMFs that can be used to better understand regional correlations and processes.

Download Full-text

Studying monthly rainfall over Dibrugarh, Assam: Use of SARIMA approach

MAUSAM ◽

10.54302/mausam.v68i2.637 ◽

2021 ◽

Vol 68 (2) ◽

pp. 349-356

Author(s):

J. HAZARIKA ◽

B. PATHAK ◽

A. N. PATOWARY

Keyword(s):

Time Series ◽

Time Series Data ◽

Moving Average ◽

Demand Management ◽

Arima Model ◽

Monthly Rainfall ◽

Series Data ◽

Data Set ◽

Modeling And Forecasting ◽

Moving Average Model

Perceptive the rainfall pattern is tough for the solution of several regional environmental issues of water resources management, with implications for agriculture, climate change, and natural calamity such as floods and droughts. Statistical computing, modeling and forecasting data are key instruments for studying these patterns. The study of time series analysis and forecasting has become a major tool in different applications in hydrology and environmental fields. Among the most effective approaches for analyzing time series data is the ARIMA (Autoregressive Integrated Moving Average) model introduced by Box and Jenkins. In this study, an attempt has been made to use Box-Jenkins methodology to build ARIMA model for monthly rainfall data taken from Dibrugarh for the period of 1980- 2014 with a total of 420 points. We investigated and found that ARIMA (0, 0, 0) (0, 1, 1)12 model is suitable for the given data set. As such this model can be used to forecast the pattern of monthly rainfall for the upcoming years, which can help the decision makers to establish priorities in terms of agricultural, flood, water demand management etc.

Download Full-text

Mining the Relationships in the form of the Predisposing Factors and Co-Incident Factors among Numerical Dynamic Attributes in Time Series Data Set by Using the Combination of Some Existing Techniques

Enterprise Information Systems VI ◽

10.1007/1-4020-3675-2_16 ◽

2006 ◽

pp. 135-142

Author(s):

Suwimon Kooptiwoot ◽

M. Abdus Salam

Keyword(s):

Time Series ◽

Time Series Data ◽

Predisposing Factors ◽

Series Data ◽

Data Set ◽

Dynamic Attributes

Download Full-text

Exploratory Time Series Data Mining by Genetic Clustering

Mathematical Methods for Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-528-3.ch010 ◽

2011 ◽

pp. 157-178

Author(s):

T. Warren Liao

Keyword(s):

Data Mining ◽

Time Series ◽

Time Series Data ◽

Distance Measures ◽

Series Data ◽

Synthetic Control ◽

Data Set ◽

Univariate Time Series ◽

Genetic Clustering ◽

Data Objects

In this chapter, we present genetic algorithm (GA) based methods developed for clustering univariate time series with equal or unequal length as an exploratory step of data mining. These methods basically implement the k-medoids algorithm. Each chromosome encodes in binary the data objects serving as the k-medoids. To compare their performance, both fixed-parameter and adaptive GAs were used. We first employed the synthetic control chart data set to investigate the performance of three fitness functions, two distance measures, and other GA parameters such as population size, crossover rate, and mutation rate. Two more sets of time series with or without known number of clusters were also experimented: one is the cylinder-bell-funnel data and the other is the novel battle simulation data. The clustering results are presented and discussed.

Download Full-text