Outlier Detection for Multivariate Time Series Using Dynamic Bayesian Networks

Jorge L. Serras; Susana Vinga; Alexandra M. Carvalho

doi:10.3390/app11041955

Outlier Detection for Multivariate Time Series Using Dynamic Bayesian Networks

Applied Sciences ◽

10.3390/app11041955 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1955

Author(s):

Jorge L. Serras ◽

Susana Vinga ◽

Alexandra M. Carvalho

Keyword(s):

Time Series ◽

Outlier Detection ◽

Web Application ◽

Detection System ◽

Multivariate Time Series ◽

Dynamic Bayesian Network ◽

Simulated Data ◽

Real Data ◽

Dynamic Bayesian Networks ◽

Detection Methods

Outliers are observations suspected of not having been generated by the underlying process of the remaining data. Many applications require a way of identifying interesting or unusual patterns in multivariate time series (MTS), now ubiquitous in many applications; however, most outlier detection methods focus solely on univariate series. We propose a complete and automatic outlier detection system covering the pre-processing of MTS data that adopts a dynamic Bayesian network (DBN) modeling algorithm. The latter encodes optimal inter and intra-time slice connectivity of transition networks capable of capturing conditional dependencies in MTS datasets. A sliding window mechanism is employed to score each MTS transition gradually, given the DBN model. Two score-analysis strategies are studied to assure an automatic classification of anomalous data. The proposed approach is first validated in simulated data, demonstrating the performance of the system. Further experiments are made on real data, by uncovering anomalies in distinct scenarios such as electrocardiogram series, mortality rate data, and written pen digits. The developed system proved beneficial in capturing unusual data resulting from temporal contexts, being suitable for any MTS scenario. A widely accessible web application employing the complete system is publicly available jointly with a tutorial.

Download Full-text

Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441453 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-20

Author(s):

Georg Steinbuss ◽

Klemens Böhm

Keyword(s):

Outlier Detection ◽

Synthetic Data ◽

Real Data ◽

Detection Methods ◽

Quality Of Data ◽

Benchmark Data ◽

Core Idea ◽

Generic Process ◽

Unsupervised Outlier Detection

Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instances with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work, we propose a generic process for the generation of datasets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. We propose and describe a generic process for the benchmarking of unsupervised outlier detection, as sketched so far. We then describe three instantiations of this generic process that generate outliers with specific characteristics, like local outliers. To validate our process, we perform a benchmark with state-of-the-art detection methods and carry out experiments to study the quality of data reconstructed in this way. Next to showcasing the workflow, this confirms the usefulness of our proposed process. In particular, our process yields regular instances close to the ones from real data. Summing up, we propose and validate a new and practical process for the benchmarking of unsupervised outlier detection.

Download Full-text

Analysis of Dynamical Behavior for Epidemic Disease COVID-19 with Application

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i4.538 ◽

2021 ◽

Vol 12 (4) ◽

pp. 568-577

Author(s):

Dr. Maysoon M. Aziz, Et. al.

Keyword(s):

Time Series ◽

Linear System ◽

Power Spectrum ◽

Dynamical Behavior ◽

Simulated Data ◽

Real Data ◽

Sir Model ◽

Epidemic Disease ◽

The World ◽

Non Linear System

In this paper, we will use the differential equations of the SIR model as a non-linear system, by using the Runge-Kutta numerical method to calculate simulated values for known epidemiological diseases related to the time series including the epidemic disease COVID-19, to obtain hypothetical results and compare them with the dailyreal statisticals of the disease for counties of the world and to know the behavior of this disease through mathematical applications, in terms of stability as well as chaos in many applied methods. The simulated data was obtained by using Matlab programms, and compared between real data and simulated datd were well compatible and with a degree of closeness. we took the data for Italy as an application. The results shows that this disease is unstable, dissipative and chaotic, and the Kcorr of it equal (0.9621), ,also the power spectrum system was used as an indicator to clarify the chaos of the disease, these proves that it is a spread,outbreaks,chaotic and epidemic disease .

Download Full-text

Empirical analysis of time series using feature selection algorithms

10.5194/egusphere-egu21-6697 ◽

2021 ◽

Author(s):

Mikhail Kanevski

Keyword(s):

Machine Learning ◽

Time Series ◽

Feature Selection ◽

Dimensional Space ◽

Multivariate Time Series ◽

Feature Space ◽

Real Data ◽

Theoretical Concepts ◽

Wide Range ◽

Linear And Nonlinear

<p>Nowadays a wide range of methods and tools to study and forecast time series is available. An important problem in forecasting concerns embedding of time series, i.e. construction of a high dimensional space where forecasting problem is considered as a regression task. There are several basic linear and nonlinear approaches of constructing such space by defining an optimal delay vector using different theoretical concepts. Another way is to consider this space as an input feature space &#8211; IFS, and to apply machine learning feature selection (FS) algorithms to optimize IFS according to the problem under study (analysis, modelling or forecasting). Such approach is an empirical one: it is based on data and depends on the FS algorithms applied. In machine learning features are generally classified as relevant, redundant and irrelevant. It gives a reach possibility to perform advanced multivariate time series exploration and development of interpretable predictive models.</p><p>Therefore, in the present research different FS algorithms are used to analyze fundamental properties of time series from empirical point of view. Linear and nonlinear simulated time series are studied in detail to understand the advantages and drawbacks of the proposed approach. Real data case studies deal with air pollution and wind speed times series. Preliminary results are quite promising and more research is in progress.</p>

Download Full-text

ODMC: Outlier Detection on Multivariate Time Series Data based on Clustering

Journal of Convergence Information Technology ◽

10.4156/jcit.vol6.issue2.8 ◽

2011 ◽

Vol 6 (2) ◽

pp. 70-77

Author(s):

Jiadong REN ◽

Hongna LI ◽

Changzhen HU ◽

Haitao HE

Keyword(s):

Time Series ◽

Outlier Detection ◽

Time Series Data ◽

Multivariate Time Series ◽

Series Data

Download Full-text

Detection of bad data images in long-term MODIS land surface temperature image time series using statistical outlier detection methods

Journal of Applied Remote Sensing ◽

10.1117/1.jrs.13.048504 ◽

2019 ◽

Vol 13 (04) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Ritesh Mujawdiya ◽

Rajat S. Chatterjee ◽

Dheeraj Kumar ◽

Narendra Singh

Keyword(s):

Time Series ◽

Surface Temperature ◽

Outlier Detection ◽

Land Surface Temperature ◽

Land Surface ◽

Detection Methods ◽

Modis Land Surface Temperature ◽

Bad Data

Download Full-text

Evaluation of multivariate time series clustering for imputation of air pollution data

Geoscientific Instrumentation Methods and Data Systems ◽

10.5194/gi-10-265-2021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 265-285

Author(s):

Wedad Alahamade ◽

Iain Lake ◽

Claire E. Reeves ◽

Beatriz De La Iglesia

Keyword(s):

Air Pollution ◽

Time Series ◽

Air Quality ◽

Missing Data ◽

Multivariate Time Series ◽

Real Data ◽

Economic Benefits ◽

Air Pollutant ◽

Monitoring Site ◽

Concentration Data

Abstract. Air pollution is one of the world's leading risk factors for death, with 6.5 million deaths per year worldwide attributed to air-pollution-related diseases. Understanding the behaviour of certain pollutants through air quality assessment can produce improvements in air quality management that will translate to health and economic benefits. However, problems with missing data and uncertainty hinder that assessment. We are motivated by the need to enhance the air pollution data available. We focus on the problem of missing air pollutant concentration data either because a limited set of pollutants is measured at a monitoring site or because an instrument is not operating, so a particular pollutant is not measured for a period of time. In our previous work, we have proposed models which can impute a whole missing time series to enhance air quality monitoring. Some of these models are based on a multivariate time series (MVTS) clustering method. Here, we apply our method to real data and show how different graphical and statistical model evaluation functions enable us to select the imputation model that produces the most plausible imputations. We then compare the Daily Air Quality Index (DAQI) values obtained after imputation with observed values incorporating missing data. Our results show that using an ensemble model that aggregates the spatial similarity obtained by the geographical correlation between monitoring stations and the fused temporal similarity between pollutant concentrations produces very good imputation results. Furthermore, the analysis enhances understanding of the different pollutant behaviours and of the characteristics of different stations according to their environmental type.

Download Full-text

Change Detection Based on the Coefficient of Variation in SAR Time-Series of Urban Areas

Remote Sensing ◽

10.3390/rs12132089 ◽

2020 ◽

Vol 12 (13) ◽

pp. 2089 ◽

Cited By ~ 1

Author(s):

Elise Colin Koeniguer ◽

Jean-Marie Nicolas

Keyword(s):

Time Series ◽

Change Detection ◽

False Alarm ◽

False Alarm Rate ◽

Coefficient Of Variation ◽

Urban Areas ◽

Real Data ◽

Probability Of Detection ◽

Detection Methods ◽

Coefficients Of Variations

This paper discusses change detection in SAR time-series. First, several statistical properties of the coefficient of variation highlight its pertinence for change detection. Subsequently, several criteria are proposed. The coefficient of variation is suggested to detect any kind of change. Furthermore, several criteria that are based on ratios of coefficients of variations are proposed to detect long events, such as construction test sites, or point-event, such as vehicles. These detection methods are first evaluated on theoretical statistical simulations to determine the scenarios where they can deliver the best results. The simulations demonstrate the greater sensitivity of the coefficient of variation to speckle mixtures, as in the case of agricultural plots. Conversely, they also demonstrate the greater specificity of the other criteria for the cases addressed: very short event or longer-term changes. Subsequently, detection performance is assessed on real data for different types of scenes and sensors (Sentinel-1, UAVSAR). In particular, a quantitative evaluation is performed with a comparison of our solutions with baseline methods. The proposed criteria achieve the best performance, with reduced computational complexity. On Sentinel-1 images containing mainly construction test sites, our best criterion reaches a probability of change detection of 90% for a false alarm rate that is equal to 5%. On UAVSAR images containing boats, the criteria proposed for short events achieve a probability of detection equal to 90% of all pixels belonging to the boats, for a false alarm rate that is equal to 2%.

Download Full-text

Outlier Detection in Multivariate Time Series Data Using a Fusion of K-Medoid, Standardized Euclidean Distance and Z-Score

Communications in Computer and Information Science - Information and Communication Technology and Applications ◽

10.1007/978-3-030-69143-1_21 ◽

2021 ◽

pp. 259-271

Author(s):

Nwodo Benita Chikodili ◽

Mohammed D. Abdulmalik ◽

Opeyemi A. Abisoye ◽

Sulaimon A. Bashir

Keyword(s):

Time Series ◽

Outlier Detection ◽

Euclidean Distance ◽

Time Series Data ◽

Multivariate Time Series ◽

Series Data ◽

Z Score

Download Full-text

Learning Dynamic Bayesian Networks from Multivariate Time Series with Changing Dependencies

Advances in Intelligent Data Analysis V - Lecture Notes in Computer Science ◽

10.1007/978-3-540-45231-7_10 ◽

2003 ◽

pp. 100-110 ◽

Cited By ~ 13

Author(s):

Allan Tucker ◽

Xiaohui Liu

Keyword(s):

Time Series ◽

Bayesian Networks ◽

Multivariate Time Series ◽

Dynamic Bayesian Networks ◽

Learning Dynamic

Download Full-text

ON THE RELIABILITY OF THE SURROGATE DATA TEST FOR NONLINEARITY IN THE ANALYSIS OF NOISY TIME SERIES

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127401003061 ◽

2001 ◽

Vol 11 (07) ◽

pp. 1881-1896 ◽

Cited By ~ 31

Author(s):

D. KUGIUMTZIS

Keyword(s):

Time Series ◽

Null Hypothesis ◽

Volterra Series ◽

Simulated Data ◽

Real Data ◽

Surrogate Data ◽

Largest Lyapunov Exponent ◽

Real World Data ◽

Original Time ◽

False Nearest Neighbors

In the analysis of real world data, the surrogate data test is often performed in order to investigate nonlinearity in the data. The null hypothesis of the test is that the original time series is generated from a linear stochastic process possibly undergoing a nonlinear static transform. We argue against reported rejection of the null hypothesis and claims of evidence of nonlinearity based on a single nonlinear statistic. In particular, two schemes for the generation of surrogate data are examined, the amplitude adjusted Fourier transform (AAFT) and the iterated AAFT (IAFFT) and many nonlinear discriminating statistics are used for testing, i.e. the fit with the Volterra series of polynomials and the fit with local average mappings, the mutual information, the correlation dimension, the false nearest neighbors, the largest Lyapunov exponent and simple nonlinear averages (the three point autocorrelation and the time reversal asymmetry). The results on simulated data and real data (EEG and exchange rates) suggest that the test depends on the method and its parameters, the algorithm generating the surrogate data and the observational data of the examined process.

Download Full-text