What's the Score?

David Birnbaum

doi:10.1086/501701

What's the Score?

Infection Control and Hospital Epidemiology ◽

10.1086/501701 ◽

2000 ◽

Vol 21 (1) ◽

pp. 57-58

Author(s):

David Birnbaum

Keyword(s):

Confidence Interval ◽

Infection Rate ◽

Binomial Distribution ◽

Sample Sizes ◽

Negative Numbers ◽

Large Sample ◽

Infection Surveillance

AbstractIf you have calculated a confidence interval for an infection rate and found the interval extending into meaningless negative numbers, chances are the error is due to use of approximation formulae. Many of us unknowingly were taught to use the Wald approximation, which does not always approximate the exact binomial distribution accurately. Poor approximation can occur in infection surveillance at both small and large sample sizes.

Download Full-text

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

Human Genomics ◽

10.1186/s40246-021-00308-5 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Weitong Cui ◽

Huaru Xue ◽

Lei Wei ◽

Jinghua Jin ◽

Xuewen Tian ◽

...

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Small Sample ◽

Differentially Expressed ◽

Cancer Type ◽

Rna Seq ◽

Sample Sizes ◽

Large Sample ◽

Expression Levels ◽

Gene Expression Levels

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.

Download Full-text

The Probability That a Measurement Falls within a Range of Standard Deviations from an Estimate of the Mean

ISRN Applied Mathematics ◽

10.5402/2012/710806 ◽

2012 ◽

Vol 2012 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Louis M. Houston

Keyword(s):

Confidence Interval ◽

Sample Size ◽

General Equation ◽

Sample Sizes ◽

The Mean ◽

Standard Deviations ◽

Intermediate Value ◽

Theoretical Results

We derive a general equation for the probability that a measurement falls within a range of n standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. The equation is derived by interpolating theoretical results for extreme sample sizes. The intermediate value of the equation is confirmed with a computational test.

Download Full-text

Coverage versus Confidence

The Mathematica Journal ◽

10.3888/tmj.23-1 ◽

2021 ◽

Vol 23 ◽

Author(s):

Peyton Cook

Keyword(s):

Confidence Interval ◽

Confidence Intervals ◽

Negative Binomial Distribution ◽

Binomial Distribution ◽

Coverage Probability ◽

Negative Binomial ◽

Asymptotic Confidence Interval ◽

Population Proportion ◽

Coverage Probabilities ◽

The Mean

This article is intended to help students understand the concept of a coverage probability involving confidence intervals. Mathematica is used as a language for describing an algorithm to compute the coverage probability for a simple confidence interval based on the binomial distribution. Then, higher-level functions are used to compute probabilities of expressions in order to obtain coverage probabilities. Several examples are presented: two confidence intervals for a population proportion based on the binomial distribution, an asymptotic confidence interval for the mean of the Poisson distribution, and an asymptotic confidence interval for a population proportion based on the negative binomial distribution.

Download Full-text

A Comparison of Large-Sample Confidence Interval Methods for the Difference of Two Binomial Probabilities

The American Statistician ◽

10.1080/00031305.1986.10475426 ◽

1986 ◽

Vol 40 (4) ◽

pp. 318-322 ◽

Cited By ~ 6

Author(s):

Walter W. Hauck ◽

Sharon Anderson

Keyword(s):

Confidence Interval ◽

Interval Methods ◽

Large Sample ◽

The Difference

Download Full-text

Very large sample sizes

BMJ ◽

10.1136/bmj.b737 ◽

2009 ◽

Vol 338 (feb25 2) ◽

pp. b737-b737 ◽

Cited By ~ 1

Author(s):

J. Fletcher

Keyword(s):

Sample Sizes ◽

Large Sample

Download Full-text

Concentration inequalities for the empirical distribution of discrete distributions: beyond the method of types

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iaz025 ◽

2019 ◽

Vol 9 (4) ◽

pp. 813-850 ◽

Cited By ~ 7

Author(s):

Jay Mardia ◽

Jiantao Jiao ◽

Ervin Tánczos ◽

Robert D Nowak ◽

Tsachy Weissman

Keyword(s):

Sample Size ◽

Empirical Distribution ◽

Discrete Distributions ◽

Concentration Inequalities ◽

Sample Sizes ◽

Alphabet Size ◽

Large Sample ◽

Kl Divergence ◽

The Difference ◽

True Distribution

Abstract We study concentration inequalities for the Kullback–Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size $n$ and alphabet size $k$, and the improvement becomes more significant when $k$ is large. We discuss the applications of our results in obtaining tighter concentration inequalities for $L_1$ deviations of the empirical distribution from the true distribution, and the difference between concentration around the expectation or zero. We also obtain asymptotically tight bounds on the variance of the KL divergence between the empirical and true distribution, and demonstrate their quantitatively different behaviours between small and large sample sizes compared to the alphabet size.

Download Full-text

Precision of sensitivity estimations in diagnostic test evaluations. Power functions for comparisons of sensitivities of two tests.

Clinical Chemistry ◽

10.1093/clinchem/31.4.574 ◽

1985 ◽

Vol 31 (4) ◽

pp. 574-580 ◽

Cited By ~ 6

Author(s):

K Linnet

Keyword(s):

Confidence Interval ◽

Sample Size ◽

Diagnostic Test ◽

Diagnostic Tests ◽

Sample Sizes ◽

Power Functions ◽

Gaussian Distributions ◽

Sensitivity Estimate

Abstract The precision of estimates of the sensitivity of diagnostic tests is evaluated. "Sensitivity" is defined as the fraction of diseased subjects with test values exceeding the 0.975-fractile of the distribution of control values. An estimate of the sensitivity is subject to sample variation because of variation of both control observations and patient observations. If gaussian distributions are assumed, the 0.95-confidence interval for a sensitivity estimate is up to +/- 0.15 for a sample of 100 controls and 100 patients. For the same sample size, minimum differences of 0.08 to 0.32 of sensitivities of two tests are established as significant with a power of 0.90. For some published diagnostic test evaluations the median sample sizes for controls and patients were 63 and 33, respectively. I show that, to obtain a reasonable precision of sensitivity estimates and a reasonable power when two tests are being compared, the number of samples should in general be considerably larger.

Download Full-text

Attributes Control Charts with Large Sample Sizes

Journal of Quality Technology ◽

10.1080/00224065.1996.11979703 ◽

1996 ◽

Vol 28 (4) ◽

pp. 451-459 ◽

Cited By ~ 19

Author(s):

Peter A. Heimann

Keyword(s):

Control Charts ◽

Sample Sizes ◽

Large Sample

Download Full-text

Bringing data to the surface: recovering data loggers for large sample sizes from marine vertebrates

Animal Biotelemetry ◽

10.1186/s40317-016-0105-8 ◽

2016 ◽

Vol 4 (1) ◽

Cited By ~ 9

Author(s):

Karissa O. Lear ◽

Nicholas M. Whitney

Keyword(s):

Sample Sizes ◽

Large Sample ◽

Data Loggers ◽

Marine Vertebrates

Download Full-text

Statistical process control charts for attribute data involving very large sample sizes: a review of problems and solutions

BMJ Quality & Safety ◽

10.1136/bmjqs-2012-001373 ◽

2013 ◽

Vol 22 (4) ◽

pp. 362-368 ◽

Cited By ~ 12

Author(s):

Mohammed A Mohammed ◽

Jagdeep S Panesar ◽

David B Laney ◽

Richard Wilson

Keyword(s):

Process Control ◽

Statistical Process Control ◽

Control Charts ◽

Sample Sizes ◽

Statistical Process ◽

Large Sample ◽

Attribute Data ◽

Process Control Charts ◽

Problems And Solutions ◽

Statistical Process Control Charts

Download Full-text