A note on effective sample size for constructing confidence intervals for the difference of two proportions

Guanghan F. Liu

doi:10.1002/pst.540

An ARFIMA-based model for daily precipitation amounts with direct access to fluctuations

Stochastic Environmental Research and Risk Assessment ◽

10.1007/s00477-020-01833-w ◽

2020 ◽

Vol 34 (10) ◽

pp. 1487-1505

Author(s):

Katja Polotzek ◽

Holger Kantz

Keyword(s):

Time Series ◽

Sample Size ◽

Confidence Intervals ◽

Daily Precipitation ◽

Rainfall Simulation ◽

Direct Access ◽

Effective Sample Size ◽

Sample Mean ◽

Daily Data ◽

Synthetic Time Series

Abstract Correlations in models for daily precipitation are often generated by elaborate numerics that employ a high number of hidden parameters. We propose a parsimonious and parametric stochastic model for European mid-latitude daily precipitation amounts with focus on the influence of correlations on the statistics. Our method is meta-Gaussian by applying a truncated-Gaussian-power (tGp) transformation to a Gaussian ARFIMA model. The speciality of this approach is that ARFIMA(1, d, 0) processes provide synthetic time series with long- (LRC), meaning the sum of all autocorrelations is infinite, and short-range (SRC) correlations by only one parameter each. Our model requires the fit of only five parameters overall that have a clear interpretation. For model time series of finite length we deduce an effective sample size for the sample mean, whose variance is increased due to correlations. For example the statistical uncertainty of the mean daily amount of 103 years of daily records at the Fichtelberg mountain in Germany equals the one of about 14 years of independent daily data. Our effective sample size approach also yields theoretical confidence intervals for annual total amounts and allows for proper model validation in terms of the empirical mean and fluctuations of annual totals. We evaluate probability plots for the daily amounts, confidence intervals based on the effective sample size for the daily mean and annual totals, and the Mahalanobis distance for the annual maxima distribution. For reproducing annual maxima the way of fitting the marginal distribution is more crucial than the presence of correlations, which is the other way round for annual totals. Our alternative to rainfall simulation proves capable of modeling daily precipitation amounts as the statistics of a random selection of 20 data sets is well reproduced.

Download Full-text

Sample Size Determination for Confidence Intervals on the Population Mean and on the Difference Between Two Population Means

Biometrics ◽

10.2307/2531696 ◽

1989 ◽

Vol 45 (3) ◽

pp. 969 ◽

Cited By ~ 59

Author(s):

S. L. Beal

Keyword(s):

Sample Size ◽

Confidence Intervals ◽

Sample Size Determination ◽

Size Determination ◽

Population Means ◽

Population Mean ◽

The Difference

Download Full-text

Comparative Study of Confidence Intervals for Proportions in Complex Sample Surveys†

Journal of Survey Statistics and Methodology ◽

10.1093/jssam/smy019 ◽

2019 ◽

Vol 7 (3) ◽

pp. 334-364 ◽

Cited By ~ 2

Author(s):

Carolina Franco ◽

Roderick J A Little ◽

Thomas A Louis ◽

Eric V Slud

Keyword(s):

Sample Size ◽

Confidence Intervals ◽

Sample Survey ◽

Point Estimate ◽

Effective Sample Size ◽

Sampling Variance ◽

Sample Surveys ◽

Jeffreys Prior ◽

Complex Sample ◽

The Impact

Abstract The most widespread method of computing confidence intervals (CIs) in complex surveys is to add and subtract the margin of error (MOE) from the point estimate, where the MOE is the estimated standard error multiplied by the suitable Gaussian quantile. This Wald-type interval is used by the American Community Survey (ACS), the largest US household sample survey. For inferences on small proportions with moderate sample sizes, this method often results in marked under-coverage and lower CI endpoint less than 0. We assess via simulation the coverage and width, in complex sample surveys, of seven alternatives to the Wald interval for a binomial proportion with sample size replaced by the ‘effective sample size,’ that is, the sample size divided by the design effect. Building on previous work by the present authors, our simulations address the impact of clustering, stratification, different stratum sampling fractions, and stratum-specific proportions. We show that all intervals undercover when there is clustering and design effects are computed from a simple design-based estimator of sampling variance. Coverage can be better calibrated for the alternatives to Wald by improving estimation of the effective sample size through superpopulation modeling. This approach is more effective in our simulations than previously proposed modifications of effective sample size. We recommend intervals of the Wilson or Bayes uniform prior form, with the Jeffreys prior interval not far behind.

Download Full-text

COMPARACION DEL PROMEDIO Y LA MEDIANA COMO ESTIMADORES DE LA MEDIA PARA MUESTRAS NORMALES DEPENDIENDO DEL TAMAÑO DE MUESTRA

Revista TECNIA ◽

10.21754/tecnia.v23i2.73 ◽

2017 ◽

Vol 23 (2) ◽

pp. 33

Author(s):

José W. Camero Jiménez ◽

Jahaziel G. Ponce Sánchez

Keyword(s):

Confidence Interval ◽

Sample Size ◽

Normal Distribution ◽

Confidence Intervals ◽

Hypothesis Tests ◽

Sample Mean ◽

Palabras Clave ◽

The Mean ◽

The Difference

Actualmente los métodos para estimar la media son los basados en el intervalo de confianza del promedio o media muestral. Este trabajo pretende ayudar a escoger el estimador (promedio o mediana) a usar dependiendo del tamaño de muestra. Para esto se han generado, vía simulación en excel, muestras con distribución normal y sus intervalos de confianza para ambos estimadores, y mediante pruebas de hipótesis para la diferencia de proporciones se demostrará que método es mejor dependiendo del tamaño de muestra. Palabras clave.-Tamaño de muestra, Intervalo de confianza, Promedio, Mediana. ABSTRACTCurrently the methods for estimating the mean are those based on the confidence interval of the average or sample mean. This paper aims to help you choose the estimator (average or median) to use depending on the sample size. For this we have generated, via simulation in EXCEL, samples with normal distribution and confidence intervals for both estimators, and by hypothesis tests for the difference of proportions show that method is better depending on the sample size. Keywords.-Sampling size, Confidence interval, Average, Median.

Download Full-text

Confidence Intervals and Sample Size to Compare the Predictive Values of Two Diagnostic Tests

Mathematics ◽

10.3390/math9131462 ◽

2021 ◽

Vol 9 (13) ◽

pp. 1462

Author(s):

José Antonio Roldán-Nofuentes ◽

Saad Bouh Regad

Keyword(s):

Sample Size ◽

Confidence Intervals ◽

Diagnostic Test ◽

Diagnostic Tests ◽

Medical Test ◽

Predictive Values ◽

General Rules ◽

Clinical Accuracy ◽

The Difference ◽

Paired Design

A binary diagnostic test is a medical test that is applied to an individual in order to determine the presence or the absence of a certain disease and whose result can be positive or negative. A positive result indicates the presence of the disease, and a negative result indicates the absence. Positive and negative predictive values represent the accuracy of a binary diagnostic test when it is applied to a cohort of individuals, and they are measures of the clinical accuracy of the binary diagnostic test. In this manuscript, we study the comparison of the positive (negative) predictive values of two binary diagnostic tests subject to a paired design through confidence intervals. We have studied confidence intervals for the difference and for the ratio of the two positive (negative) predictive values. Simulation experiments have been carried out to study the asymptotic behavior of the confidence intervals, giving some general rules for application. We also study a method to calculate the sample size to compare the parameters using confidence intervals. We have written a program in R to solve the problems studied in this manuscript. The results have been applied to the diagnosis of colorectal cancer.

Download Full-text

Smart Estimation Ver-H.1.0

10.31227/osf.io/2tyqk ◽

2018 ◽

Author(s):

Sigit Haryadi

Keyword(s):

Sample Size ◽

Confidence Level ◽

Time Span ◽

Regression Equation ◽

Weight Factor ◽

The Past ◽

Level Of Confidence ◽

The Future ◽

Future Value ◽

The Difference

We cannot be sure exactly what will happen, we can only estimate by using a particular method, where each method must have the formula to create a regression equation and a formula to calculate the confidence level of the estimated value. This paper conveys a method of estimating the future values, in which the formula for creating a regression equation is based on the assumption that the future value will depend on the difference of the past values divided by a weight factor which corresponding to the time span to the present, and the formula for calculating the level of confidence is to use "the Haryadi Index". The advantage of this method is to remain accurate regardless of the sample size and may ignore the past value that is considered irrelevant.

Download Full-text

Reporting Frequency and Sample Size: Effects on Prediction, Confidence Levels and Confidence Intervals

SSRN Electronic Journal ◽

10.2139/ssrn.991307 ◽

2007 ◽

Cited By ~ 1

Author(s):

Terence J. Pitre

Keyword(s):

Sample Size ◽

Confidence Intervals ◽

Size Effects ◽

Confidence Levels ◽

Prediction Confidence ◽

Reporting Frequency

Download Full-text

Inference for treatment effect parameters in potentially misspecified high-dimensional models

Biometrika ◽

10.1093/biomet/asaa071 ◽

2020 ◽

Author(s):

Oliver Dukes ◽

Stijn Vansteelandt

Keyword(s):

Sample Size ◽

Confidence Intervals ◽

Treatment Effect ◽

Observational Studies ◽

Treatment Effects ◽

Treatment Selection ◽

High Dimensional ◽

Selection Mechanism ◽

Dimensional Models ◽

Effect Parameter

Summary Eliminating the effect of confounding in observational studies typically involves fitting a model for an outcome adjusted for covariates. When, as often, these covariates are high-dimensional, this necessitates the use of sparse estimators, such as the lasso, or other regularization approaches. Naïve use of such estimators yields confidence intervals for the conditional treatment effect parameter that are not uniformly valid. Moreover, as the number of covariates grows with the sample size, correctly specifying a model for the outcome is nontrivial. In this article we deal with both of these concerns simultaneously, obtaining confidence intervals for conditional treatment effects that are uniformly valid, regardless of whether the outcome model is correct. This is done by incorporating an additional model for the treatment selection mechanism. When both models are correctly specified, we can weaken the standard conditions on model sparsity. Our procedure extends to multivariate treatment effect parameters and complex longitudinal settings.

Download Full-text