scholarly journals Using Bootstrap to Increase Data in Predictive Analytics with Extreme Value Distribution

Author(s):  
Dang Kien Cuong ◽  
Duong Ton Dam ◽  
Duong Ton Thai Duong

The bootstrap is one of the method of studying statistical math which this article uses it but is a major tool for studying and evaluating the values of parameters in probability distribution. Overview of the theory of infinite distribution functions. The tool to deal with the problems raised in the paper is the mathematical methods of random analysis by theory of random process and multivariate statistics. Observations (realisations of a stationary process) are not independent, but dependence in time series is relatively simple example of dependent data. Through a simulation study we found that the pseudo data generated from the bootstrap method always showed a weaker dependence among the observations than the time series they were sampled from, hence we can draw the conclusion that even by re-sampling blocks instead of single observations we will lose some of structural from of the original sample. A potential difficulty by the using of likelihood methods for the GEV concerns the regularity conditions that are required for the usual asymptotic properties associated with the maximum likelihood estimator to be valid. To estimate the value of a parameter in GEV we can use classical methods of mathematical statistics such as the maximum likelihood method or the least squares method, but they all require a certain number samples for verification. For the bootstrap method, this is obviously not needed; here we use the limit theorems of probability theory and multivariate statistics to solve the problem even if there is only one sample data. That is the important practical significance that our paper wants to convey. In predictive analysis problems, in case the actual data is incomplete, not long enough, we can use bootstrap to add data.

2011 ◽  
Vol 52-54 ◽  
pp. 546-549
Author(s):  
Shi Bo Xin

According to sample mean submits normal distribution which is extracted from normal distribution, we give the equation of parameters estimation for normal distribution by bootstrap method, then we make a simulation analysis and compare the effect of parameters estimation which uses traditional maximum likelihood method and bootstrap method.


2021 ◽  
Vol 7 ◽  
pp. e726
Author(s):  
Tianming Yu ◽  
Qunfeng Gan ◽  
Guoliang Feng

Background The real time series is affected by various combinations of influences, consequently, it has a variety of variation modality. It is hard to reflect the variation characteristic of the time series accurately when simulating time series only by a single model. Most of the existing methods focused on numerical prediction of time series. Also, the forecast uncertainty of time series is resolved by the interval prediction. However, few researches focus on making the model interpretable and easily comprehended by humans. Methods To overcome this limitation, a new prediction modelling methodology based on fuzzy cognitive maps is proposed. The bootstrap method is adopted to select multiple sub-sequences at first. As a result, the variation modality are contained in these sub-sequences. Then, the fuzzy cognitive maps are constructed in terms of these sub-sequences, respectively. Furthermore, these fuzzy cognitive maps models are merged by means of granular computing. The established model not only performs well in numerical and interval predictions but also has better interpretability. Results Experimental studies involving both synthetic and real-life datasets demonstrate the usefulness and satisfactory efficiency of the proposed approach.


2021 ◽  
Author(s):  
Helmut H. Strey ◽  
Rajat Kumar ◽  
Lilianne Mujica-Parodi

In this article, we develop a Maximum likelihood (ML) approach to estimate parameters from correlated time traces that originate from coupled Ornstein-Uhlenbeck processes. The most common technique to characterize the correlation between time-series is to calculate the Pearson correlation coefficient. Here we show that for time series with memory (or a characteristic relaxation time), our method gives more reliable results, but also results in coupling coefficients and their uncertainties given the data. We investigate how these uncertainties depend on the number of samples, the relaxation times and sampling time. To validate our analytic results, we performed simulations over a wide range of correlation coefficients both using our maximum likelihood solutions and Markov-Chain Monte-Carlo (MCMC) simulations. We found that both ML and MCMC result in the same parameter estimations. We also found that when analyzing the same data, the ML and MCMC uncertainties are strongly correlated, while ML underestimates the uncertainties by a factor of 1.5 to 3 over a large range of parameters. For large datasets, we can therfore use the less computationally expensive maximum likelihood method to run over the whole dataset, and then we can use MCMC on a few samples to determine the factor by which the ML method underestimates the uncertainties. To illustrate the application of our method, we apply it to time series of brain activation using fMRI measurements of the human default mode network. We show that our method significantly improves the interpretation of multi-subject measurements of correlations between brain regions by providing parameter confidence intervals for individual measurements, which allows for distinguishing between the variance from differences between subjects from variance due to measurement error.


2013 ◽  
Vol 29 (5) ◽  
pp. 920-940 ◽  
Author(s):  
Ngai Hang Chan ◽  
Deyuan Li ◽  
Liang Peng ◽  
Rongmao Zhang

Relevant sample quantities such as the sample autocorrelation function and extremes contain useful information about autoregressive time series with heteroskedastic errors. As these quantities usually depend on the tail index of the underlying heteroskedastic time series, estimating the tail index becomes an important task. Since the tail index of such a model is determined by a moment equation, one can estimate the underlying tail index by solving the sample moment equation with the unknown parameters being replaced by their quasi-maximum likelihood estimates. To construct a confidence interval for the tail index, one needs to estimate the complicated asymptotic variance of the tail index estimator, however. In this paper the asymptotic normality of the tail index estimator is first derived, and a profile empirical likelihood method to construct a confidence interval for the tail index is then proposed. A simulation study shows that the proposed empirical likelihood method works better than the bootstrap method in terms of coverage accuracy, especially when the process is nearly nonstationary.


Author(s):  
Qiguo Hu ◽  
Zhan Gao

In order to enhance the reliability of a system that has dependent competition failure, a reliability evaluation method is proposed to evaluate the dependent competition failure and multi-parameter degradation failure. The multi-parameter degradation failure process is described with the Wiener stochastic process and the inverse Gaussian stochastic process. The Copula function is used to model the system's multi-degradation failure process. The two-stage maximum likelihood method is used to estimate the degradation failure parameters. The conditional probability of dependent competition failure in terms of degradation degree is established. The Bayes-Bootstrap method is utilized to correct the dependent competition failure parameters obtained with the maximum likelihood method and to further establish the system's dependent competition failure model. The degradation data of an aero-engine is used as an example to analyze the reliability under competition between dependent competition failure and multi-parameter degradation failure. The analysis results can effectively demonstrate the reliability of an aero-engine's performance and verify the validity of the model, thus having good engineering application values.


2013 ◽  
Vol 1 (5) ◽  
pp. 6001-6024 ◽  
Author(s):  
K. Kochanek ◽  
W. G. Strupczewski ◽  
E. Bogdanowicz ◽  
W. Feluch ◽  
I. Markiewicz

Abstract. The alleged changes in rivers' flow regime resulted in the surge in the methods of non-stationary flood frequency analysis (NFFA). The maximum likelihood method is said to produce big systematic errors in moments and quantiles resulting mainly from bad assumption of the model (model error) unless this model is the normal distribution. Since the estimators by the method of linear moments (L-moments) yield much lower model errors than those by the maximum likelihood, to improve the accuracy of the parameters and quantiles in non-stationary case, a new two-stage methodology of NFFA based on the concept of L-moments was developed. Despite taking advantage of the positive characteristics of L-moments, a new technique also allows to keep the calculations "distribution independent" as long as possible. These two stages consists in (1) least square estimation of trends in mean value and/or in standard deviation and "de-trendisation" of the time series and (2) estimation of parameters and quantiles by means of stationary sample with L-moments method and "re-trendisation" of quantiles. As a result time-dependent quantiles for a given time and return period can be calculated. The comparative results of Monte Carlo simulations confirmed the superiority of two-stage NFFA methodology over the classical maximum likelihood one. Further analysis of trends in GEV-parent-distributed generic time series by means of both NFFA methods revealed big differences between classical and two-stage estimators of trends got for the same data by the same model (GEV or Gumbel). Additionally, it turned out that the quantiles estimated by the methods of traditional stationary flood frequency analysis equal only to those non-stationary calculated for a strict middle of the time series. It proves that use of traditional stationary methods in conditions of variable regime is too much a simplification and leads to erroneous results. Therefore, when the phenomenon is non-stationary, so should be the methods used for its interpretation.


2000 ◽  
Vol 03 (03) ◽  
pp. 567-568
Author(s):  
M. CIOGLI ◽  
G. ROTUNDO ◽  
B. TIROZZI

A diffusion equation for the price evolution of the Italian share "Olivetti" is found by investigating a series of its data. The coefficients of this equation are found by using the maximum likelihood method based on martingale theory. We evaluate pricing and hedging strategy by the Sornette and Bouchaud approach.


Sign in / Sign up

Export Citation Format

Share Document