A note on effective sample size for constructing confidence intervals for the difference of two proportions

2012 ◽  
Vol 11 (2) ◽  
pp. 163-169 ◽  
Author(s):  
Guanghan F. Liu
2020 ◽  
Vol 34 (10) ◽  
pp. 1487-1505
Author(s):  
Katja Polotzek ◽  
Holger Kantz

Abstract Correlations in models for daily precipitation are often generated by elaborate numerics that employ a high number of hidden parameters. We propose a parsimonious and parametric stochastic model for European mid-latitude daily precipitation amounts with focus on the influence of correlations on the statistics. Our method is meta-Gaussian by applying a truncated-Gaussian-power (tGp) transformation to a Gaussian ARFIMA model. The speciality of this approach is that ARFIMA(1, d, 0) processes provide synthetic time series with long- (LRC), meaning the sum of all autocorrelations is infinite, and short-range (SRC) correlations by only one parameter each. Our model requires the fit of only five parameters overall that have a clear interpretation. For model time series of finite length we deduce an effective sample size for the sample mean, whose variance is increased due to correlations. For example the statistical uncertainty of the mean daily amount of 103 years of daily records at the Fichtelberg mountain in Germany equals the one of about 14 years of independent daily data. Our effective sample size approach also yields theoretical confidence intervals for annual total amounts and allows for proper model validation in terms of the empirical mean and fluctuations of annual totals. We evaluate probability plots for the daily amounts, confidence intervals based on the effective sample size for the daily mean and annual totals, and the Mahalanobis distance for the annual maxima distribution. For reproducing annual maxima the way of fitting the marginal distribution is more crucial than the presence of correlations, which is the other way round for annual totals. Our alternative to rainfall simulation proves capable of modeling daily precipitation amounts as the statistics of a random selection of 20 data sets is well reproduced.


2019 ◽  
Vol 7 (3) ◽  
pp. 334-364 ◽  
Author(s):  
Carolina Franco ◽  
Roderick J A Little ◽  
Thomas A Louis ◽  
Eric V Slud

Abstract The most widespread method of computing confidence intervals (CIs) in complex surveys is to add and subtract the margin of error (MOE) from the point estimate, where the MOE is the estimated standard error multiplied by the suitable Gaussian quantile. This Wald-type interval is used by the American Community Survey (ACS), the largest US household sample survey. For inferences on small proportions with moderate sample sizes, this method often results in marked under-coverage and lower CI endpoint less than 0. We assess via simulation the coverage and width, in complex sample surveys, of seven alternatives to the Wald interval for a binomial proportion with sample size replaced by the ‘effective sample size,’ that is, the sample size divided by the design effect. Building on previous work by the present authors, our simulations address the impact of clustering, stratification, different stratum sampling fractions, and stratum-specific proportions. We show that all intervals undercover when there is clustering and design effects are computed from a simple design-based estimator of sampling variance. Coverage can be better calibrated for the alternatives to Wald by improving estimation of the effective sample size through superpopulation modeling. This approach is more effective in our simulations than previously proposed modifications of effective sample size. We recommend intervals of the Wilson or Bayes uniform prior form, with the Jeffreys prior interval not far behind.


2017 ◽  
Vol 23 (2) ◽  
pp. 33
Author(s):  
José W. Camero Jiménez ◽  
Jahaziel G. Ponce Sánchez

Actualmente los métodos para estimar la media son los basados en el intervalo de confianza del promedio o media muestral. Este trabajo pretende ayudar a escoger el estimador (promedio o mediana) a usar dependiendo del tamaño de muestra. Para esto se han generado, vía simulación en excel, muestras con distribución normal y sus intervalos de confianza para ambos estimadores, y mediante pruebas de hipótesis para la diferencia de proporciones se demostrará que método es mejor dependiendo del tamaño de muestra. Palabras clave.-Tamaño de muestra, Intervalo de confianza, Promedio, Mediana. ABSTRACTCurrently the methods for estimating the mean are those based on the confidence interval of the average or sample mean. This paper aims to help you choose the estimator (average or median) to use depending on the sample size. For this we have generated, via simulation in EXCEL, samples with normal distribution and confidence intervals for both estimators, and by hypothesis tests for the difference of proportions show that method is better depending on the sample size. Keywords.-Sampling size, Confidence interval, Average, Median.


Mathematics ◽  
2021 ◽  
Vol 9 (13) ◽  
pp. 1462
Author(s):  
José Antonio Roldán-Nofuentes ◽  
Saad Bouh Regad

A binary diagnostic test is a medical test that is applied to an individual in order to determine the presence or the absence of a certain disease and whose result can be positive or negative. A positive result indicates the presence of the disease, and a negative result indicates the absence. Positive and negative predictive values represent the accuracy of a binary diagnostic test when it is applied to a cohort of individuals, and they are measures of the clinical accuracy of the binary diagnostic test. In this manuscript, we study the comparison of the positive (negative) predictive values of two binary diagnostic tests subject to a paired design through confidence intervals. We have studied confidence intervals for the difference and for the ratio of the two positive (negative) predictive values. Simulation experiments have been carried out to study the asymptotic behavior of the confidence intervals, giving some general rules for application. We also study a method to calculate the sample size to compare the parameters using confidence intervals. We have written a program in R to solve the problems studied in this manuscript. The results have been applied to the diagnosis of colorectal cancer.


2018 ◽  
Author(s):  
Sigit Haryadi

We cannot be sure exactly what will happen, we can only estimate by using a particular method, where each method must have the formula to create a regression equation and a formula to calculate the confidence level of the estimated value. This paper conveys a method of estimating the future values, in which the formula for creating a regression equation is based on the assumption that the future value will depend on the difference of the past values divided by a weight factor which corresponding to the time span to the present, and the formula for calculating the level of confidence is to use "the Haryadi Index". The advantage of this method is to remain accurate regardless of the sample size and may ignore the past value that is considered irrelevant.


Biometrika ◽  
2020 ◽  
Author(s):  
Oliver Dukes ◽  
Stijn Vansteelandt

Summary Eliminating the effect of confounding in observational studies typically involves fitting a model for an outcome adjusted for covariates. When, as often, these covariates are high-dimensional, this necessitates the use of sparse estimators, such as the lasso, or other regularization approaches. Naïve use of such estimators yields confidence intervals for the conditional treatment effect parameter that are not uniformly valid. Moreover, as the number of covariates grows with the sample size, correctly specifying a model for the outcome is nontrivial. In this article we deal with both of these concerns simultaneously, obtaining confidence intervals for conditional treatment effects that are uniformly valid, regardless of whether the outcome model is correct. This is done by incorporating an additional model for the treatment selection mechanism. When both models are correctly specified, we can weaken the standard conditions on model sparsity. Our procedure extends to multivariate treatment effect parameters and complex longitudinal settings.


2016 ◽  
Vol 407 ◽  
pp. 371-386 ◽  
Author(s):  
Krzysztof Bartoszek

Sign in / Sign up

Export Citation Format

Share Document