scholarly journals On the Impossibility of Learning the Missing Mass

Entropy ◽  
2019 ◽  
Vol 21 (1) ◽  
pp. 28 ◽  
Author(s):  
Elchanan Mossel ◽  
Mesrob I. Ohannessian

This paper shows that one cannot learn the probability of rare events without imposing further structural assumptions. The event of interest is that of obtaining an outcome outside the coverage of an i.i.d. sample from a discrete distribution. The probability of this event is referred to as the “missing mass”. The impossibility result can then be stated as: the missing mass is not distribution-free learnable in relative error. The proof is semi-constructive and relies on a coupling argument using a dithered geometric distribution. Via a reduction, this impossibility also extends to both discrete and continuous tail estimation. These results formalize the folklore that in order to predict rare events without restrictive modeling, one necessarily needs distributions with “heavy tails”.

Author(s):  
Jasdev Bhatti ◽  
Mohit Kumar Kakkar

Background and Aim: With an increase in demands about reliability of industrial machines following continuous or discrete distribution, the important thing to be noticed is that in all previous researches where systems are having more than one failure no iteration technique has been studied to separate the failed unit on basis of its failure. Therefore, aim of our paper is to analyze the real industrial discrete problem following cold standby units arranged in parallel manner with newly concept of inspection procedure for failed units to inspect the exact failure and being communicator to the repairman for repairing exact failed part of unit for saving time and maintenance cost. Methods: The geometric distribution and regenerative techniques had been applied for calculating different reliability measures like mean time to system failure, availability of a system, inspection, repair and failed time of unit. Results: Graphical and analytical study had also been done to analyze the increasing/decreasing behavior of profit function w.r.t repair and failure rate. The system responded properly in fulfilling his basic needs. Conclusion: The calculated value of all reliability parameter is helpful for studying any other models following same concept under different environmental conditions. Thus, it concluded that, reliability increases/decreases with increase in repair/failure rate. Also, the evaluated results by this paper provides the better reliability testing strategies that helps to develop new techniques which leads to increase the effectiveness of system.


2017 ◽  
Vol 24 (4) ◽  
pp. 737-744 ◽  
Author(s):  
Manfred Mudelsee ◽  
Miguel A. Bermejo

Abstract. The tail probability, P, of the distribution of a variable is important for risk analysis of extremes. Many variables in complex geophysical systems show heavy tails, where P decreases with the value, x, of a variable as a power law with a characteristic exponent, α. Accurate estimation of α on the basis of data is currently hindered by the problem of the selection of the order, that is, the number of largest x values to utilize for the estimation. This paper presents a new, widely applicable, data-adaptive order selector, which is based on computer simulations and brute force search. It is the first in a set of papers on optimal heavy tail estimation. The new selector outperforms competitors in a Monte Carlo experiment, where simulated data are generated from stable distributions and AR(1) serial dependence. We calculate error bars for the estimated α by means of simulations. We illustrate the method on an artificial time series. We apply it to an observed, hydrological time series from the River Elbe and find an estimated characteristic exponent of 1.48 ± 0.13. This result indicates finite mean but infinite variance of the statistical distribution of river runoff.


2010 ◽  
Vol 2010 ◽  
pp. 1-11 ◽  
Author(s):  
Graciela Chichilnisky

We extend the foundation of probability in samples with rare events that are potentially catastrophic, calledblack swans, such as natural hazards, market crashes, catastrophic climate change, and species extinction. Such events are generally treated as ‘‘outliers’’ and disregarded. We propose a new axiomatization of probability requiring equal treatment in the measurement of rare and frequent events—the Swan Axiom—and characterize the subjective probabilities that the axioms imply: these are neither finitely additive nor countably additive but a combination of both. They exclude countably additive probabilities as in De Groot (1970) and Arrow (1971) and are a strict subset of Savage (1954) probabilities that are finitely additive measures. Our subjective probabilities are standard distributions when the sample has no black swans. The finitely additive part assigns however more weight to rare events than do standard distributions and in that sense explains the persistent observation of ‘‘power laws’’ and ‘‘heavy tails’’ that eludes classic theory. The axioms extend earlier work by Chichilnisky (1996, 2000, 2002, 2009) to encompass the foundation of subjective probability and axiomatic treatments of subjective probability by Villegas (1964), De Groot (1963), Dubins and Savage (1965), Dubins (1975) Purves and Sudderth (1976) and of choice under uncertainty by Arrow (1971).


2020 ◽  
Vol 23 (1) ◽  
pp. 35-57
Author(s):  
Zainab Mohammed Darwish Al-Balushi ◽  
◽  
M. Mazharul Islam ◽  

Geometric distribution belongs to the family of discrete distribution that deals with the count of trail needed for first occurrence or success of any event. However, little attention has been paid in applying the GLM for the geometric distribution, which has a very simple form for its probability mass function with a single parameter. In this study, an attempt has been made to introduce geometric regression for modelling the count data. We have illustrated the suitability of the geometric regression model for analyzing the count data on time to first antenatal care visit that displayed under-dispersion, and the results were compared with Poisson and negative binomial regressions. We conclude that the geometric regression model may provide a flexible model for fitting count data sets which may present over-dispersion or under-dispersion, and the model may serve as an alternative model to the very familiar Poisson and negative binomial models for modelling count data.


Author(s):  
G.G. Hamedani ◽  
Mahrokh Najaf ◽  
Amin Roshani ◽  
Nadeem Shafique Butt

In this paper, certain characterizations of twenty newly proposed discrete distributions: the discrete gen- eralized Lindley distribution of El-Morshedy et al.(2021), the discrete Gumbel distribution of Chakraborty et al.(2020), the skewed geometric distribution of Ong et al.(2020), the discrete Poisson X gamma distri- bution of Para et al.(2020), the discrete Cos-Poisson distribution of Bakouch et al.(2021), the size biased Poisson Ailamujia distribution of Dar and Para(2021), the generalized Hermite-Genocchi distribution of El-Desouky et al.(2021), the Poisson quasi-xgamma distribution of Altun et al.(2021a), the exponentiated discrete inverse Rayleigh distribution of Mashhadzadeh and MirMostafaee(2020), the Mlynar distribution of Fr¨uhwirth et al.(2021), the flexible one-parameter discrete distribution of Eliwa and El-Morshedy(2021), the two-parameter discrete Perks distribution of Tyagi et al.(2020), the discrete Weibull G family distribution of Ibrahim et al.(2021), the discrete Marshall–Olkin Lomax distribution of Ibrahim and Almetwally(2021), the two-parameter exponentiated discrete Lindley distribution of El-Morshedy et al.(2019), the natural discrete one-parameter polynomial exponential distribution of Mukherjee et al.(2020), the zero-truncated discrete Akash distribution of Sium and Shanker(2020), the two-parameter quasi Poisson-Aradhana distribution of Shanker and Shukla(2020), the zero-truncated Poisson-Ishita distribution of Shukla et al.(2020) and the Poisson-Shukla distribution of Shukla and Shanker(2020) are presented to complete, in some way, the au- thors’ works.


1987 ◽  
Vol 24 (03) ◽  
pp. 619-630 ◽  
Author(s):  
Richard L. Smith ◽  
Ishay Weissman

We consider the relative error of a tail function when this is approximated by y–α using an estimator of Hill's for α. The results combine recent work of Davis and Resnick on tail estimation with Anderson's work on large deviations in extreme-value theory. Treating separately the domains of attraction of Φα and Λ, we obtain general conditions for the relative error to tend to 0 as u →∞, y → ∞ simultaneously. The results serve as warning against the automatic extrapolation of estimates based on extreme-value approximations.


2006 ◽  
Vol 38 (2) ◽  
pp. 545-558 ◽  
Author(s):  
Søren Asmussen ◽  
Dirk P. Kroese

The estimation of P(Sn>u) by simulation, where Sn is the sum of independent, identically distributed random varibles Y1,…,Yn, is of importance in many applications. We propose two simulation estimators based upon the identity P(Sn>u)=nP(Sn>u, Mn=Yn), where Mn=max(Y1,…,Yn). One estimator uses importance sampling (for Yn only), and the other uses conditional Monte Carlo conditioning upon Y1,…,Yn−1. Properties of the relative error of the estimators are derived and a numerical study given in terms of the M/G/1 queue in which n is replaced by an independent geometric random variable N. The conclusion is that the new estimators compare extremely favorably with previous ones. In particular, the conditional Monte Carlo estimator is the first heavy-tailed example of an estimator with bounded relative error. Further improvements are obtained in the random-N case, by incorporating control variates and stratification techniques into the new estimation procedures.


2006 ◽  
Vol 38 (02) ◽  
pp. 545-558 ◽  
Author(s):  
Søren Asmussen ◽  
Dirk P. Kroese

The estimation of P(S n >u) by simulation, where S n is the sum of independent, identically distributed random varibles Y 1 ,…,Y n , is of importance in many applications. We propose two simulation estimators based upon the identity P(S n >u)=nP(S n >u, M n =Y n ), where M n =max(Y 1 ,…,Y n ). One estimator uses importance sampling (for Y n only), and the other uses conditional Monte Carlo conditioning upon Y 1 ,…,Y n−1. Properties of the relative error of the estimators are derived and a numerical study given in terms of the M/G/1 queue in which n is replaced by an independent geometric random variable N. The conclusion is that the new estimators compare extremely favorably with previous ones. In particular, the conditional Monte Carlo estimator is the first heavy-tailed example of an estimator with bounded relative error. Further improvements are obtained in the random-N case, by incorporating control variates and stratification techniques into the new estimation procedures.


Author(s):  
Lili Puspita Rahayu ◽  
Kusman Sadik ◽  
Indahwati Indahwati

Poisson distribution is one of discrete distribution that is often used in modeling of rare events. The data obtained in form of counts with non-negative integers. One of analysis that is used in modeling count data is Poisson regression. Deviation of assumption that often occurs in the Poisson regression is overdispersion. Cause of overdispersion is an excess zero probability on the response variable. Solving model that be used to overcome of overdispersion is zero-inflated Poisson (ZIP) regression. The research aimed to develop a study of overdispersion for Poisson and ZIP regression on some characteristics of the data. Overdispersion on some characteristics of the data that were studied in this research are simulated by combining the parameter of Poisson distribution (λ), zero probability (p), and sample size (n) on the response variable then comparing the Poisson and ZIP regression models. Overdispersion study on data simulation showed that the larger λ, n, and p, the better is the model of ZIP than Poisson regression. The results of this simulation are also strengthened by the exploration of Pearson residual in Poisson and ZIP regression.


Sign in / Sign up

Export Citation Format

Share Document