What's the Score?

2000 ◽  
Vol 21 (1) ◽  
pp. 57-58
Author(s):  
David Birnbaum

AbstractIf you have calculated a confidence interval for an infection rate and found the interval extending into meaningless negative numbers, chances are the error is due to use of approximation formulae. Many of us unknowingly were taught to use the Wald approximation, which does not always approximate the exact binomial distribution accurately. Poor approximation can occur in infection surveillance at both small and large sample sizes.

2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Weitong Cui ◽  
Huaru Xue ◽  
Lei Wei ◽  
Jinghua Jin ◽  
Xuewen Tian ◽  
...  

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Louis M. Houston

We derive a general equation for the probability that a measurement falls within a range of n standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. The equation is derived by interpolating theoretical results for extreme sample sizes. The intermediate value of the equation is confirmed with a computational test.


2021 ◽  
Vol 23 ◽  
Author(s):  
Peyton Cook

This article is intended to help students understand the concept of a coverage probability involving confidence intervals. Mathematica is used as a language for describing an algorithm to compute the coverage probability for a simple confidence interval based on the binomial distribution. Then, higher-level functions are used to compute probabilities of expressions in order to obtain coverage probabilities. Several examples are presented: two confidence intervals for a population proportion based on the binomial distribution, an asymptotic confidence interval for the mean of the Poisson distribution, and an asymptotic confidence interval for a population proportion based on the negative binomial distribution.


BMJ ◽  
2009 ◽  
Vol 338 (feb25 2) ◽  
pp. b737-b737 ◽  
Author(s):  
J. Fletcher
Keyword(s):  

2019 ◽  
Vol 9 (4) ◽  
pp. 813-850 ◽  
Author(s):  
Jay Mardia ◽  
Jiantao Jiao ◽  
Ervin Tánczos ◽  
Robert D Nowak ◽  
Tsachy Weissman

Abstract We study concentration inequalities for the Kullback–Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size $n$ and alphabet size $k$, and the improvement becomes more significant when $k$ is large. We discuss the applications of our results in obtaining tighter concentration inequalities for $L_1$ deviations of the empirical distribution from the true distribution, and the difference between concentration around the expectation or zero. We also obtain asymptotically tight bounds on the variance of the KL divergence between the empirical and true distribution, and demonstrate their quantitatively different behaviours between small and large sample sizes compared to the alphabet size.


1985 ◽  
Vol 31 (4) ◽  
pp. 574-580 ◽  
Author(s):  
K Linnet

Abstract The precision of estimates of the sensitivity of diagnostic tests is evaluated. "Sensitivity" is defined as the fraction of diseased subjects with test values exceeding the 0.975-fractile of the distribution of control values. An estimate of the sensitivity is subject to sample variation because of variation of both control observations and patient observations. If gaussian distributions are assumed, the 0.95-confidence interval for a sensitivity estimate is up to +/- 0.15 for a sample of 100 controls and 100 patients. For the same sample size, minimum differences of 0.08 to 0.32 of sensitivities of two tests are established as significant with a power of 0.90. For some published diagnostic test evaluations the median sample sizes for controls and patients were 63 and 33, respectively. I show that, to obtain a reasonable precision of sensitivity estimates and a reasonable power when two tests are being compared, the number of samples should in general be considerably larger.


Sign in / Sign up

Export Citation Format

Share Document