Small Order Patterns in Big Time Series: A Practical Guide

Christoph Bandt

doi:10.3390/e21060613

Small Order Patterns in Big Time Series: A Practical Guide

Entropy ◽

10.3390/e21060613 ◽

2019 ◽

Vol 21 (6) ◽

pp. 613 ◽

Cited By ~ 1

Author(s):

Christoph Bandt

Keyword(s):

Time Series ◽

Variance Components ◽

Brain Activity ◽

Real Data ◽

Speech Analysis ◽

Permutation Entropy ◽

Data Series ◽

Autocorrelation Functions ◽

Wide Range ◽

Theoretical Side

The study of order patterns of three equally-spaced values x t , x t + d , x t + 2 d in a time series is a powerful tool. The lag d is changed in a wide range so that the differences of the frequencies of order patterns become autocorrelation functions. Similar to a spectrogram in speech analysis, four ordinal autocorrelation functions are used to visualize big data series, as for instance heart and brain activity over many hours. The method applies to real data without preprocessing, and outliers and missing data do not matter. On the theoretical side, we study the properties of order correlation functions and show that the four autocorrelation functions are orthogonal in a certain sense. An analysis of variance of a modified permutation entropy can be performed with four variance components associated with the functions.

Download Full-text

SEED-G: Simulated EEG Data Generator for Testing Connectivity Algorithms

Sensors ◽

10.3390/s21113632 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3632

Author(s):

Alessandra Anzolin ◽

Jlenia Toppi ◽

Manuela Petti ◽

Febo Cincotti ◽

Laura Astolfi

Keyword(s):

Time Series ◽

Brain Activity ◽

Simulated Data ◽

Ground Truth ◽

Real Data ◽

Connectivity Pattern ◽

Data Simulation ◽

Eeg Data ◽

Data Generator ◽

Wide Range

EEG signals are widely used to estimate brain circuits associated with specific tasks and cognitive processes. The testing of connectivity estimators is still an open issue because of the lack of a ground-truth in real data. Existing solutions such as the generation of simulated data based on a manually imposed connectivity pattern or mass oscillators can model only a few real cases with limited number of signals and spectral properties that do not reflect those of real brain activity. Furthermore, the generation of time series reproducing non-ideal and non-stationary ground-truth models is still missing. In this work, we present the SEED-G toolbox for the generation of pseudo-EEG data with imposed connectivity patterns, overcoming the existing limitations and enabling control of several parameters for data simulation according to the user’s needs. We first described the toolbox including guidelines for its correct use and then we tested its performances showing how, in a wide range of conditions, datasets composed by up to 60 time series were successfully generated in less than 5 s and with spectral features similar to real data. Then, SEED-G is employed for studying the effect of inter-trial variability Partial Directed Coherence (PDC) estimates, confirming its robustness.

Download Full-text

Empirical analysis of time series using feature selection algorithms

10.5194/egusphere-egu21-6697 ◽

2021 ◽

Author(s):

Mikhail Kanevski

Keyword(s):

Machine Learning ◽

Time Series ◽

Feature Selection ◽

Dimensional Space ◽

Multivariate Time Series ◽

Feature Space ◽

Real Data ◽

Theoretical Concepts ◽

Wide Range ◽

Linear And Nonlinear

<p>Nowadays a wide range of methods and tools to study and forecast time series is available. An important problem in forecasting concerns embedding of time series, i.e. construction of a high dimensional space where forecasting problem is considered as a regression task. There are several basic linear and nonlinear approaches of constructing such space by defining an optimal delay vector using different theoretical concepts. Another way is to consider this space as an input feature space &#8211; IFS, and to apply machine learning feature selection (FS) algorithms to optimize IFS according to the problem under study (analysis, modelling or forecasting). Such approach is an empirical one: it is based on data and depends on the FS algorithms applied. In machine learning features are generally classified as relevant, redundant and irrelevant. It gives a reach possibility to perform advanced multivariate time series exploration and development of interpretable predictive models.</p><p>Therefore, in the present research different FS algorithms are used to analyze fundamental properties of time series from empirical point of view. Linear and nonlinear simulated time series are studied in detail to understand the advantages and drawbacks of the proposed approach. Real data case studies deal with air pollution and wind speed times series. Preliminary results are quite promising and more research is in progress.</p>

Download Full-text

Cross recurrence plot based synchronization of time series

Nonlinear Processes in Geophysics ◽

10.5194/npg-9-325-2002 ◽

2002 ◽

Vol 9 (3/4) ◽

pp. 325-331 ◽

Cited By ~ 164

Author(s):

N. Marwan ◽

M. Thiel ◽

N. R. Nowaczyk

Keyword(s):

Time Series ◽

Sediment Cores ◽

Real Data ◽

Recurrence Plot ◽

Magnetic Data ◽

Data Series ◽

Recurrence Plots ◽

Makarov Basin ◽

The Cross ◽

Cross Recurrence Plot

Abstract. The method of recurrence plots is extended to the cross recurrence plots (CRP) which, among others, enables the study of synchronization or time differences in two time series. This is emphasized in a distorted main diagonal in the cross recurrence plot, the line of synchronization (LOS). A non-parametrical fit of this LOS can be used to rescale the time axis of the two data series (whereby one of them is compressed or stretched) so that they are synchronized. An application of this method to geophysical sediment core data illustrates its suitability for real data. The rock magnetic data of two different sediment cores from the Makarov Basin can be adjusted to each other by using this method, so that they are comparable.

Download Full-text

Estimating the function of oscillatory components in SSA-based forecasting model

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v5i1.312 ◽

2019 ◽

Vol 5 (1) ◽

pp. 11

Author(s):

Winita Sulandari ◽

Subanar Subanar ◽

Suhartono Suhartono ◽

Herni Utami ◽

Muhammad Hisyam Lee

Keyword(s):

Time Series ◽

Time Series Data ◽

Real Data ◽

Least Square ◽

Series Data ◽

Forecasting Model ◽

Data Series ◽

Ordinary Least Square ◽

Seasonal Time Series ◽

Estimated Parameters

The study of SSA-based forecasting model is always interesting due to its capability in modeling trend and multiple seasonal time series. The aim of this study is to propose an iterative ordinary least square (OLS) for estimating the oscillatory with time-varying amplitude model that usually found in SSA decomposition. We compare the results with those obtained by nonlinear least square based on Levenberg Marquardt (NLM) method. A simulation study based on the time series data which has a linear amplitude modulated sinusoid component is conducted to investigate the error of estimated parameters of the model obtained by the proposed method. A real data series was also considered for the application example. The results show that in terms of forecasting accuracy, the SSA-based model where the oscillatory components are obtained by iterative OLS is nearly the same with that is obtained by the NLM method.

Download Full-text

Using the Information Provided by Forbidden Ordinal Patterns in Permutation Entropy to Reinforce Time Series Discrimination Capabilities

Entropy ◽

10.3390/e22050494 ◽

2020 ◽

Vol 22 (5) ◽

pp. 494

Author(s):

David Cuesta-Frau

Keyword(s):

Time Series ◽

Classification Accuracy ◽

Input Parameter ◽

Classification Performance ◽

Permutation Entropy ◽

Data Series ◽

Additional Information ◽

New Methods ◽

Discriminating Power ◽

Ordinal Patterns

Despite its widely tested and proven usefulness, there is still room for improvement in the basic permutation entropy (PE) algorithm, as several subsequent studies have demonstrated in recent years. Some of these new methods try to address the well-known PE weaknesses, such as its focus only on ordinal and not on amplitude information, and the possible detrimental impact of equal values found in subsequences. Other new methods address less specific weaknesses, such as the PE results’ dependence on input parameter values, a common problem found in many entropy calculation methods. The lack of discriminating power among classes in some cases is also a generic problem when entropy measures are used for data series classification. This last problem is the one specifically addressed in the present study. Toward that purpose, the classification performance of the standard PE method was first assessed by conducting several time series classification tests over a varied and diverse set of data. Then, this performance was reassessed using a new Shannon Entropy normalisation scheme proposed in this paper: divide the relative frequencies in PE by the number of different ordinal patterns actually found in the time series, instead of by the theoretically expected number. According to the classification accuracy obtained, this last approach exhibited a higher class discriminating power. It was capable of finding significant differences in six out of seven experimental datasets—whereas the standard PE method only did in four—and it also had better classification accuracy. It can be concluded that using the additional information provided by the number of forbidden/found patterns, it is possible to achieve a higher discriminating power than using the classical PE normalisation method. The resulting algorithm is also very similar to that of PE and very easy to implement.

Download Full-text

Slope Entropy: A New Time Series Complexity Estimator Based on Both Symbolic Patterns and Amplitude Information

Entropy ◽

10.3390/e21121167 ◽

2019 ◽

Vol 21 (12) ◽

pp. 1167 ◽

Cited By ~ 3

Author(s):

David Cuesta-Frau

Keyword(s):

Time Series ◽

Classification Performance ◽

Permutation Entropy ◽

Data Series ◽

Time Series Classification ◽

New Methods ◽

Discriminating Power ◽

Desirable Feature ◽

Encoding Method ◽

Robust To Noise

The development of new measures and algorithms to quantify the entropy or related concepts of a data series is a continuous effort that has brought many innovations in this regard in recent years. The ultimate goal is usually to find new methods with a higher discriminating power, more efficient, more robust to noise and artifacts, less dependent on parameters or configurations, or any other possibly desirable feature. Among all these methods, Permutation Entropy (PE) is a complexity estimator for a time series that stands out due to its many strengths, with very few weaknesses. One of these weaknesses is the PE’s disregarding of time series amplitude information. Some PE algorithm modifications have been proposed in order to introduce such information into the calculations. We propose in this paper a new method, Slope Entropy (SlopEn), that also addresses this flaw but in a different way, keeping the symbolic representation of subsequences using a novel encoding method based on the slope generated by two consecutive data samples. By means of a thorough and extensive set of comparative experiments with PE and Sample Entropy (SampEn), we demonstrate that SlopEn is a very promising method with clearly a better time series classification performance than those previous methods.

Download Full-text

Using Permutations for Hierarchical Clustering of Time Series

Entropy ◽

10.3390/e21030306 ◽

2019 ◽

Vol 21 (3) ◽

pp. 306 ◽

Cited By ~ 2

Author(s):

Jose Cánovas ◽

Antonio Guillamón ◽

María Ruiz-Abellón

Keyword(s):

Time Series ◽

Hierarchical Clustering ◽

Real Data ◽

Distance Measures ◽

Data Series ◽

Clustering Methods ◽

Linkage Method ◽

Hierarchical Clustering Methods ◽

Simulated Time ◽

Simulated Time Series

Two distances based on permutations are considered to measure the similarity of two time series according to their strength of dependency. The distance measures are used together with different linkages to get hierarchical clustering methods of time series by dependency. We apply these distances to both simulated theoretical and real data series. For simulated time series the distances show good clustering results, both in the case of linear and non-linear dependencies. The effect of the embedding dimension and the linkage method are also analyzed. Finally, several real data series are properly clustered using the proposed method.

Download Full-text

USING PERMUTATIONS TO FIND STRUCTURAL CHANGES IN TIME SERIES

Fluctuation and Noise Letters ◽

10.1142/s0219477511000375 ◽

2011 ◽

Vol 10 (01) ◽

pp. 13-30 ◽

Cited By ~ 4

Author(s):

J. S. CÁNOVAS ◽

A. GUILLAMÓN ◽

M. C. RUIZ

Keyword(s):

Time Series ◽

Structural Changes ◽

Real Data ◽

Data Series ◽

Embedding Dimension

The number of permutations appearing in data series is used for detecting changes in the structure of such series. We show the influence of the permutations length (embedding dimension) in order to get good results. We use permutations to analyze real data from medical and biological origins. Some problems that appear in applying these techniques are pointed out.

Download Full-text

Wavelet estimation of functional coefficient regression models

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691318500042 ◽

2018 ◽

Vol 16 (01) ◽

pp. 1850004

Author(s):

Michel H. Montoril ◽

Pedro A. Morettin ◽

Chang Chiann

Keyword(s):

Time Series ◽

Real Data ◽

Rates Of Convergence ◽

Nonlinear Time Series ◽

Local Linear Regression ◽

Nonparametric Model ◽

Time Series Models ◽

Data Set ◽

Wide Range ◽

Functional Coefficient

The area of nonlinear time series models has experienced a great development since the 1980s. Although there is a wide range of parametric nonlinear time series models, in general, we do not know if the postulated model is the most appropriated one for a specific data set. This situation highlights the importance of nonparametric models. An interesting nonparametric model to fit nonlinear time series is the well-known functional coefficient regression model. Nonparametric estimations by, e.g., local linear regression and splines, are developed in the literature. In this work, we study the estimation of such a model using wavelets. It is a proposal that takes into account both, classical and warped wavelets. We present the rates of convergence of the proposed estimators and carry out simulation studies to evaluate automatic procedures (among AIC, AICc and BIC) for selecting the coarsest and finest levels to be used during the estimation process. Moreover, we illustrate the methodology with an application to a real data set, where we also calculate multi-step-ahead forecasts and compare the results with other methods known in the literature.

Download Full-text

An Adaptive Piecewise Harmonic Analysis Method for Reconstructing Multi-Year Sea Surface Chlorophyll-A Time Series

Remote Sensing ◽

10.3390/rs13142727 ◽

2021 ◽

Vol 13 (14) ◽

pp. 2727

Author(s):

Yueqi Wang ◽

Zhiqiang Gao ◽

Jicai Ning

Keyword(s):

Time Series ◽

Harmonic Analysis ◽

Chlorophyll A ◽

Sea Surface ◽

Data Series ◽

Analysis Method ◽

Wide Range ◽

Harmonic Analysis Method ◽

Improved Accuracy ◽

Validation Scheme

High-quality remotely sensed satellite data series are important for many ecological and environmental applications. Unfortunately, irregular spatiotemporal samples, frequent image gaps and inevitable observational biases can greatly hinder their application. As one of the most effective gap filling and noise reduction approaches, the harmonic analysis of time series (HANTS) method has been widely used to reconstruct geographical variables; however, when applied on multi-year time series over large spatial areas, the optimal harmonic formulas are generally varied in different locations or change across different years. The question of how to choose the optimal harmonic formula is still unanswered due to the deficiency of appropriate criteria. In this study, an adaptive piecewise harmonic analysis method (AP-HA) is proposed to reconstruct multi-year seasonal data series. The method introduces a cross-validation scheme to adaptively determine the optimal harmonic model and employs an iterative piecewise scheme to better track the local traits. Whenapplied to the satellite-derived sea surface chlorophyll-a time series over the Bohai and Yellow Seas of China, the AP-HA obtains reliable reconstruction results and outperforms the conventional HANTS methods, achieving improved accuracy. Due to its generic approach to filling missing observations and tracking detailed traits, the AP-HA method has a wide range of applications for other seasonal geographical variables.

Download Full-text