scholarly journals Outlier Detection for Multivariate Time Series Using Dynamic Bayesian Networks

2021 ◽  
Vol 11 (4) ◽  
pp. 1955
Author(s):  
Jorge L. Serras ◽  
Susana Vinga ◽  
Alexandra M. Carvalho

Outliers are observations suspected of not having been generated by the underlying process of the remaining data. Many applications require a way of identifying interesting or unusual patterns in multivariate time series (MTS), now ubiquitous in many applications; however, most outlier detection methods focus solely on univariate series. We propose a complete and automatic outlier detection system covering the pre-processing of MTS data that adopts a dynamic Bayesian network (DBN) modeling algorithm. The latter encodes optimal inter and intra-time slice connectivity of transition networks capable of capturing conditional dependencies in MTS datasets. A sliding window mechanism is employed to score each MTS transition gradually, given the DBN model. Two score-analysis strategies are studied to assure an automatic classification of anomalous data. The proposed approach is first validated in simulated data, demonstrating the performance of the system. Further experiments are made on real data, by uncovering anomalies in distinct scenarios such as electrocardiogram series, mortality rate data, and written pen digits. The developed system proved beneficial in capturing unusual data resulting from temporal contexts, being suitable for any MTS scenario. A widely accessible web application employing the complete system is publicly available jointly with a tutorial.

2021 ◽  
Vol 15 (4) ◽  
pp. 1-20
Author(s):  
Georg Steinbuss ◽  
Klemens Böhm

Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instances with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work, we propose a generic process for the generation of datasets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. We propose and describe a generic process for the benchmarking of unsupervised outlier detection, as sketched so far. We then describe three instantiations of this generic process that generate outliers with specific characteristics, like local outliers. To validate our process, we perform a benchmark with state-of-the-art detection methods and carry out experiments to study the quality of data reconstructed in this way. Next to showcasing the workflow, this confirms the usefulness of our proposed process. In particular, our process yields regular instances close to the ones from real data. Summing up, we propose and validate a new and practical process for the benchmarking of unsupervised outlier detection.


Author(s):  
Dr. Maysoon M. Aziz, Et. al.

In this paper, we will use the differential equations of the SIR model as a non-linear system, by using the Runge-Kutta numerical method to calculate simulated values for known epidemiological diseases related to the time series including the epidemic disease COVID-19, to obtain hypothetical results and compare them with the dailyreal statisticals of the disease for counties of the world and to know the behavior of this disease through mathematical applications, in terms of stability as well as chaos in many applied methods. The simulated data was obtained by using Matlab programms, and compared between real data and simulated datd were well compatible and with a degree of closeness. we took the data for Italy as an application.  The results shows that this disease is unstable, dissipative and chaotic, and the Kcorr of it equal (0.9621), ,also the power spectrum system was used as an indicator to clarify the chaos of the disease, these proves that it is a spread,outbreaks,chaotic and epidemic disease .


2021 ◽  
Author(s):  
Mikhail Kanevski

<p>Nowadays a wide range of methods and tools to study and forecast time series is available. An important problem in forecasting concerns embedding of time series, i.e. construction of a high dimensional space where forecasting problem is considered as a regression task. There are several basic linear and nonlinear approaches of constructing such space by defining an optimal delay vector using different theoretical concepts. Another way is to consider this space as an input feature space – IFS, and to apply machine learning feature selection (FS) algorithms to optimize IFS according to the problem under study (analysis, modelling or forecasting). Such approach is an empirical one: it is based on data and depends on the FS algorithms applied. In machine learning features are generally classified as relevant, redundant and irrelevant. It gives a reach possibility to perform advanced multivariate time series exploration and development of interpretable predictive models.</p><p>Therefore, in the present research different FS algorithms are used to analyze fundamental properties of time series from empirical point of view. Linear and nonlinear simulated time series are studied in detail to understand the advantages and drawbacks of the proposed approach. Real data case studies deal with air pollution and wind speed times series. Preliminary results are quite promising and more research is in progress.</p>


2021 ◽  
Vol 10 (2) ◽  
pp. 265-285
Author(s):  
Wedad Alahamade ◽  
Iain Lake ◽  
Claire E. Reeves ◽  
Beatriz De La Iglesia

Abstract. Air pollution is one of the world's leading risk factors for death, with 6.5 million deaths per year worldwide attributed to air-pollution-related diseases. Understanding the behaviour of certain pollutants through air quality assessment can produce improvements in air quality management that will translate to health and economic benefits. However, problems with missing data and uncertainty hinder that assessment. We are motivated by the need to enhance the air pollution data available. We focus on the problem of missing air pollutant concentration data either because a limited set of pollutants is measured at a monitoring site or because an instrument is not operating, so a particular pollutant is not measured for a period of time. In our previous work, we have proposed models which can impute a whole missing time series to enhance air quality monitoring. Some of these models are based on a multivariate time series (MVTS) clustering method. Here, we apply our method to real data and show how different graphical and statistical model evaluation functions enable us to select the imputation model that produces the most plausible imputations. We then compare the Daily Air Quality Index (DAQI) values obtained after imputation with observed values incorporating missing data. Our results show that using an ensemble model that aggregates the spatial similarity obtained by the geographical correlation between monitoring stations and the fused temporal similarity between pollutant concentrations produces very good imputation results. Furthermore, the analysis enhances understanding of the different pollutant behaviours and of the characteristics of different stations according to their environmental type.


2020 ◽  
Vol 12 (13) ◽  
pp. 2089 ◽  
Author(s):  
Elise Colin Koeniguer ◽  
Jean-Marie Nicolas

This paper discusses change detection in SAR time-series. First, several statistical properties of the coefficient of variation highlight its pertinence for change detection. Subsequently, several criteria are proposed. The coefficient of variation is suggested to detect any kind of change. Furthermore, several criteria that are based on ratios of coefficients of variations are proposed to detect long events, such as construction test sites, or point-event, such as vehicles. These detection methods are first evaluated on theoretical statistical simulations to determine the scenarios where they can deliver the best results. The simulations demonstrate the greater sensitivity of the coefficient of variation to speckle mixtures, as in the case of agricultural plots. Conversely, they also demonstrate the greater specificity of the other criteria for the cases addressed: very short event or longer-term changes. Subsequently, detection performance is assessed on real data for different types of scenes and sensors (Sentinel-1, UAVSAR). In particular, a quantitative evaluation is performed with a comparison of our solutions with baseline methods. The proposed criteria achieve the best performance, with reduced computational complexity. On Sentinel-1 images containing mainly construction test sites, our best criterion reaches a probability of change detection of 90% for a false alarm rate that is equal to 5%. On UAVSAR images containing boats, the criteria proposed for short events achieve a probability of detection equal to 90% of all pixels belonging to the boats, for a false alarm rate that is equal to 2%.


2001 ◽  
Vol 11 (07) ◽  
pp. 1881-1896 ◽  
Author(s):  
D. KUGIUMTZIS

In the analysis of real world data, the surrogate data test is often performed in order to investigate nonlinearity in the data. The null hypothesis of the test is that the original time series is generated from a linear stochastic process possibly undergoing a nonlinear static transform. We argue against reported rejection of the null hypothesis and claims of evidence of nonlinearity based on a single nonlinear statistic. In particular, two schemes for the generation of surrogate data are examined, the amplitude adjusted Fourier transform (AAFT) and the iterated AAFT (IAFFT) and many nonlinear discriminating statistics are used for testing, i.e. the fit with the Volterra series of polynomials and the fit with local average mappings, the mutual information, the correlation dimension, the false nearest neighbors, the largest Lyapunov exponent and simple nonlinear averages (the three point autocorrelation and the time reversal asymmetry). The results on simulated data and real data (EEG and exchange rates) suggest that the test depends on the method and its parameters, the algorithm generating the surrogate data and the observational data of the examined process.


Sign in / Sign up

Export Citation Format

Share Document