On the Impossibility of Learning the Missing Mass

Elchanan Mossel; Mesrob I. Ohannessian

doi:10.3390/e21010028

On the Impossibility of Learning the Missing Mass

Entropy ◽

10.3390/e21010028 ◽

2019 ◽

Vol 21 (1) ◽

pp. 28 ◽

Cited By ~ 1

Author(s):

Elchanan Mossel ◽

Mesrob I. Ohannessian

Keyword(s):

Relative Error ◽

Heavy Tails ◽

Rare Events ◽

Geometric Distribution ◽

Discrete Distribution ◽

Impossibility Result ◽

Distribution Free ◽

Missing Mass ◽

Tail Estimation

This paper shows that one cannot learn the probability of rare events without imposing further structural assumptions. The event of interest is that of obtaining an outcome outside the coverage of an i.i.d. sample from a discrete distribution. The probability of this event is referred to as the “missing mass”. The impossibility result can then be stated as: the missing mass is not distribution-free learnable in relative error. The proof is semi-constructive and relies on a coupling argument using a dithered geometric distribution. Via a reduction, this impossibility also extends to both discrete and continuous tail estimation. These results formalize the folklore that in order to predict rare events without restrictive modeling, one necessarily needs distributions with “heavy tails”.

Download Full-text

Reliability Analysis of Cold Standby Parallel System Possessing Failure And Repair Rate Under Geometric Distribution

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191219110504 ◽

2019 ◽

Vol 13 ◽

Author(s):

Jasdev Bhatti ◽

Mohit Kumar Kakkar

Keyword(s):

Failure Rate ◽

Geometric Distribution ◽

Discrete Distribution ◽

Maintenance Cost ◽

Discrete Problem ◽

New Techniques ◽

Testing Strategies ◽

Cold Standby ◽

Mean Time ◽

Iteration Technique

Background and Aim: With an increase in demands about reliability of industrial machines following continuous or discrete distribution, the important thing to be noticed is that in all previous researches where systems are having more than one failure no iteration technique has been studied to separate the failed unit on basis of its failure. Therefore, aim of our paper is to analyze the real industrial discrete problem following cold standby units arranged in parallel manner with newly concept of inspection procedure for failed units to inspect the exact failure and being communicator to the repairman for repairing exact failed part of unit for saving time and maintenance cost. Methods: The geometric distribution and regenerative techniques had been applied for calculating different reliability measures like mean time to system failure, availability of a system, inspection, repair and failed time of unit. Results: Graphical and analytical study had also been done to analyze the increasing/decreasing behavior of profit function w.r.t repair and failure rate. The system responded properly in fulﬁlling his basic needs. Conclusion: The calculated value of all reliability parameter is helpful for studying any other models following same concept under different environmental conditions. Thus, it concluded that, reliability increases/decreases with increase in repair/failure rate. Also, the evaluated results by this paper provides the better reliability testing strategies that helps to develop new techniques which leads to increase the effectiveness of system.

Download Full-text

Optimal heavy tail estimation – Part 1: Order selection

Nonlinear Processes in Geophysics ◽

10.5194/npg-24-737-2017 ◽

2017 ◽

Vol 24 (4) ◽

pp. 737-744 ◽

Cited By ~ 4

Author(s):

Manfred Mudelsee ◽

Miguel A. Bermejo

Keyword(s):

Time Series ◽

Heavy Tails ◽

Characteristic Exponent ◽

Tail Probability ◽

Heavy Tail ◽

Infinite Variance ◽

Accurate Estimation ◽

Monte Carlo Experiment ◽

River Elbe ◽

Tail Estimation

Abstract. The tail probability, P, of the distribution of a variable is important for risk analysis of extremes. Many variables in complex geophysical systems show heavy tails, where P decreases with the value, x, of a variable as a power law with a characteristic exponent, α. Accurate estimation of α on the basis of data is currently hindered by the problem of the selection of the order, that is, the number of largest x values to utilize for the estimation. This paper presents a new, widely applicable, data-adaptive order selector, which is based on computer simulations and brute force search. It is the first in a set of papers on optimal heavy tail estimation. The new selector outperforms competitors in a Monte Carlo experiment, where simulated data are generated from stable distributions and AR(1) serial dependence. We calculate error bars for the estimated α by means of simulations. We illustrate the method on an artificial time series. We apply it to an observed, hydrological time series from the River Elbe and find an estimated characteristic exponent of 1.48 ± 0.13. This result indicates finite mean but infinite variance of the statistical distribution of river runoff.

Download Full-text

The Foundations of Probability with Black Swans

Journal of Probability and Statistics ◽

10.1155/2010/838240 ◽

2010 ◽

Vol 2010 ◽

pp. 1-11 ◽

Cited By ~ 7

Author(s):

Graciela Chichilnisky

Keyword(s):

Subjective Probability ◽

Heavy Tails ◽

Rare Events ◽

Power Laws ◽

Equal Treatment ◽

Subjective Probabilities ◽

Black Swans ◽

Classic Theory ◽

Finitely Additive Measures ◽

Additive Measures

We extend the foundation of probability in samples with rare events that are potentially catastrophic, calledblack swans, such as natural hazards, market crashes, catastrophic climate change, and species extinction. Such events are generally treated as ‘‘outliers’’ and disregarded. We propose a new axiomatization of probability requiring equal treatment in the measurement of rare and frequent events—the Swan Axiom—and characterize the subjective probabilities that the axioms imply: these are neither finitely additive nor countably additive but a combination of both. They exclude countably additive probabilities as in De Groot (1970) and Arrow (1971) and are a strict subset of Savage (1954) probabilities that are finitely additive measures. Our subjective probabilities are standard distributions when the sample has no black swans. The finitely additive part assigns however more weight to rare events than do standard distributions and in that sense explains the persistent observation of ‘‘power laws’’ and ‘‘heavy tails’’ that eludes classic theory. The axioms extend earlier work by Chichilnisky (1996, 2000, 2002, 2009) to encompass the foundation of subjective probability and axiomatic treatments of subjective probability by Villegas (1964), De Groot (1963), Dubins and Savage (1965), Dubins (1975) Purves and Sudderth (1976) and of choice under uncertainty by Arrow (1971).

Download Full-text

Geometric Regression for Modelling Count Data on the Time-to-First Antenatal Care Visit

Journal of Statistics Advances in Theory and Applications ◽

10.18642/jsata_7100122148 ◽

2020 ◽

Vol 23 (1) ◽

pp. 35-57

Author(s):

Zainab Mohammed Darwish Al-Balushi ◽

◽

M. Mazharul Islam ◽

Keyword(s):

Antenatal Care ◽

Regression Model ◽

Count Data ◽

Negative Binomial ◽

Geometric Distribution ◽

Discrete Distribution ◽

Mass Function ◽

Probability Mass Function ◽

Antenatal Care Visit ◽

Care Visit

Geometric distribution belongs to the family of discrete distribution that deals with the count of trail needed for first occurrence or success of any event. However, little attention has been paid in applying the GLM for the geometric distribution, which has a very simple form for its probability mass function with a single parameter. In this study, an attempt has been made to introduce geometric regression for modelling the count data. We have illustrated the suitability of the geometric regression model for analyzing the count data on time to first antenatal care visit that displayed under-dispersion, and the results were compared with Poisson and negative binomial regressions. We conclude that the geometric regression model may provide a flexible model for fitting count data sets which may present over-dispersion or under-dispersion, and the model may serve as an alternative model to the very familiar Poisson and negative binomial models for modelling count data.

Download Full-text

Characterizations of Twenty (2020-2021) Proposed Discrete Distributions

Pakistan Journal of Statistics and Operation Research ◽

10.18187/pjsor.v17i4.3902 ◽

2021 ◽

pp. 847-884

Author(s):

G.G. Hamedani ◽

Mahrokh Najaf ◽

Amin Roshani ◽

Nadeem Shafique Butt

Keyword(s):

Exponential Distribution ◽

Poisson Distribution ◽

Geometric Distribution ◽

Discrete Distribution ◽

Gumbel Distribution ◽

Rayleigh Distribution ◽

Discrete Distributions ◽

Lindley Distribution ◽

Lomax Distribution ◽

Two Parameter

In this paper, certain characterizations of twenty newly proposed discrete distributions: the discrete gen- eralized Lindley distribution of El-Morshedy et al.(2021), the discrete Gumbel distribution of Chakraborty et al.(2020), the skewed geometric distribution of Ong et al.(2020), the discrete Poisson X gamma distri- bution of Para et al.(2020), the discrete Cos-Poisson distribution of Bakouch et al.(2021), the size biased Poisson Ailamujia distribution of Dar and Para(2021), the generalized Hermite-Genocchi distribution of El-Desouky et al.(2021), the Poisson quasi-xgamma distribution of Altun et al.(2021a), the exponentiated discrete inverse Rayleigh distribution of Mashhadzadeh and MirMostafaee(2020), the Mlynar distribution of Fr¨uhwirth et al.(2021), the ﬂexible one-parameter discrete distribution of Eliwa and El-Morshedy(2021), the two-parameter discrete Perks distribution of Tyagi et al.(2020), the discrete Weibull G family distribution of Ibrahim et al.(2021), the discrete Marshall–Olkin Lomax distribution of Ibrahim and Almetwally(2021), the two-parameter exponentiated discrete Lindley distribution of El-Morshedy et al.(2019), the natural discrete one-parameter polynomial exponential distribution of Mukherjee et al.(2020), the zero-truncated discrete Akash distribution of Sium and Shanker(2020), the two-parameter quasi Poisson-Aradhana distribution of Shanker and Shukla(2020), the zero-truncated Poisson-Ishita distribution of Shukla et al.(2020) and the Poisson-Shukla distribution of Shukla and Shanker(2020) are presented to complete, in some way, the au- thors’ works.

Download Full-text

Large deviations of tail estimators based on the Pareto approximation

Journal of Applied Probability ◽

10.1017/s0021900200031351 ◽

1987 ◽

Vol 24 (03) ◽

pp. 619-630 ◽

Cited By ~ 1

Author(s):

Richard L. Smith ◽

Ishay Weissman

Keyword(s):

Large Deviations ◽

Recent Work ◽

Relative Error ◽

Extreme Value Theory ◽

Value Theory ◽

Extreme Value ◽

Domains Of Attraction ◽

Tail Estimation ◽

Tail Function ◽

General Conditions

We consider the relative error of a tail function when this is approximated by y–α using an estimator of Hill's for α. The results combine recent work of Davis and Resnick on tail estimation with Anderson's work on large deviations in extreme-value theory. Treating separately the domains of attraction of Φα and Λ, we obtain general conditions for the relative error to tend to 0 as u →∞, y → ∞ simultaneously. The results serve as warning against the automatic extrapolation of estimates based on extreme-value approximations.

Download Full-text

Improved algorithms for rare event simulation with heavy tails

Advances in Applied Probability ◽

10.1239/aap/1151337084 ◽

2006 ◽

Vol 38 (2) ◽

pp. 545-558 ◽

Cited By ~ 46

Author(s):

Søren Asmussen ◽

Dirk P. Kroese

Keyword(s):

Monte Carlo ◽

Relative Error ◽

Heavy Tails ◽

Numerical Study ◽

Rare Event ◽

Random Variable ◽

Event Simulation ◽

Conditional Monte Carlo ◽

Heavy Tailed ◽

Geometric Random Variable

The estimation of P(Sn>u) by simulation, where Sn is the sum of independent, identically distributed random varibles Y1,…,Yn, is of importance in many applications. We propose two simulation estimators based upon the identity P(Sn>u)=nP(Sn>u, Mn=Yn), where Mn=max(Y1,…,Yn). One estimator uses importance sampling (for Yn only), and the other uses conditional Monte Carlo conditioning upon Y1,…,Yn−1. Properties of the relative error of the estimators are derived and a numerical study given in terms of the M/G/1 queue in which n is replaced by an independent geometric random variable N. The conclusion is that the new estimators compare extremely favorably with previous ones. In particular, the conditional Monte Carlo estimator is the first heavy-tailed example of an estimator with bounded relative error. Further improvements are obtained in the random-N case, by incorporating control variates and stratification techniques into the new estimation procedures.

Download Full-text

Improved algorithms for rare event simulation with heavy tails

Advances in Applied Probability ◽

10.1017/s0001867800001099 ◽

2006 ◽

Vol 38 (02) ◽

pp. 545-558 ◽

Cited By ~ 36

Author(s):

Søren Asmussen ◽

Dirk P. Kroese

Keyword(s):

Monte Carlo ◽

Relative Error ◽

Heavy Tails ◽

Numerical Study ◽

Rare Event ◽

Random Variable ◽

Event Simulation ◽

Conditional Monte Carlo ◽

Heavy Tailed ◽

Geometric Random Variable

The estimation of P(S n >u) by simulation, where S n is the sum of independent, identically distributed random varibles Y 1 ,…,Y n , is of importance in many applications. We propose two simulation estimators based upon the identity P(S n >u)=nP(S n >u, M n =Y n ), where M n =max(Y 1 ,…,Y n ). One estimator uses importance sampling (for Y n only), and the other uses conditional Monte Carlo conditioning upon Y 1 ,…,Y n−1. Properties of the relative error of the estimators are derived and a numerical study given in terms of the M/G/1 queue in which n is replaced by an independent geometric random variable N. The conclusion is that the new estimators compare extremely favorably with previous ones. In particular, the conditional Monte Carlo estimator is the first heavy-tailed example of an estimator with bounded relative error. Further improvements are obtained in the random-N case, by incorporating control variates and stratification techniques into the new estimation procedures.

Download Full-text

Exponentiated generalized geometric distribution: A new discrete distribution

Hacettepe Journal of Mathematics and Statistics ◽

10.15672/hjms.20159013119 ◽

2015 ◽

Vol 45 (90) ◽

pp. 1-1 ◽

Cited By ~ 1

Author(s):

Hamid Bidram ◽

Rasool Roozegar ◽

Vahid Nekoukhou

Keyword(s):

Geometric Distribution ◽

Discrete Distribution ◽

Generalized Geometric Distribution

Download Full-text

Overdispersion study of poisson and zero-inflated poisson regression for some characteristics of the data on lamda, n, p

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v2i3.73 ◽

2016 ◽

Vol 2 (3) ◽

pp. 140

Author(s):

Lili Puspita Rahayu ◽

Kusman Sadik ◽

Indahwati Indahwati

Keyword(s):

Sample Size ◽

Poisson Distribution ◽

Count Data ◽

Poisson Regression ◽

Regression Models ◽

Rare Events ◽

Discrete Distribution ◽

Data Simulation ◽

Response Variable ◽

Pearson Residual

Poisson distribution is one of discrete distribution that is often used in modeling of rare events. The data obtained in form of counts with non-negative integers. One of analysis that is used in modeling count data is Poisson regression. Deviation of assumption that often occurs in the Poisson regression is overdispersion. Cause of overdispersion is an excess zero probability on the response variable. Solving model that be used to overcome of overdispersion is zero-inflated Poisson (ZIP) regression. The research aimed to develop a study of overdispersion for Poisson and ZIP regression on some characteristics of the data. Overdispersion on some characteristics of the data that were studied in this research are simulated by combining the parameter of Poisson distribution (λ), zero probability (p), and sample size (n) on the response variable then comparing the Poisson and ZIP regression models. Overdispersion study on data simulation showed that the larger λ, n, and p, the better is the model of ZIP than Poisson regression. The results of this simulation are also strengthened by the exploration of Pearson residual in Poisson and ZIP regression.

Download Full-text