Sample Size Choice: Charts for Experiments With Linear Models. (2nd ed.) (Vol. 122 in the Statistics: Textbooks and Monographs Series).

Peter A. Lachenbruch; Robert E. Odeh; Martin Fox

doi:10.2307/2290309

On the variance parameter estimator in general linear models

Metrika ◽

10.1007/s00184-019-00751-4 ◽

2019 ◽

Vol 83 (2) ◽

pp. 243-254

Author(s):

Mathias Lindholm ◽

Felix Wahl

Keyword(s):

Sample Size ◽

Linear Models ◽

General Linear ◽

Finite Sample ◽

Parameter Estimator ◽

General Linear Models ◽

Vector Autoregressive ◽

Finite Sample Size ◽

Error Terms ◽

Variance Parameter

Abstract In the present note we consider general linear models where the covariates may be both random and non-random, and where the only restrictions on the error terms are that they are independent and have finite fourth moments. For this class of models we analyse the variance parameter estimator. In particular we obtain finite sample size bounds for the variance of the variance parameter estimator which are independent of covariate information regardless of whether the covariates are random or not. For the case with random covariates this immediately yields bounds on the unconditional variance of the variance estimator—a situation which in general is analytically intractable. The situation with random covariates is illustrated in an example where a certain vector autoregressive model which appears naturally within the area of insurance mathematics is analysed. Further, the obtained bounds are sharp in the sense that both the lower and upper bound will converge to the same asymptotic limit when scaled with the sample size. By using the derived bounds it is simple to show convergence in mean square of the variance parameter estimator for both random and non-random covariates. Moreover, the derivation of the bounds for the above general linear model is based on a lemma which applies in greater generality. This is illustrated by applying the used techniques to a class of mixed effects models.

Download Full-text

An Improved Sample Size Calculation Method for Score Tests in Generalized Linear Models

Statistics in Biopharmaceutical Research ◽

10.1080/19466315.2020.1756398 ◽

2020 ◽

pp. 1-10

Author(s):

Yongqiang Tang ◽

Liang Zhu ◽

Jiezhun Gu

Keyword(s):

Sample Size ◽

Generalized Linear Models ◽

Calculation Method ◽

Linear Models ◽

Sample Size Calculation ◽

Score Tests

Download Full-text

Quality of Split-Mouth Trials in Dentistry: 1998, 2008, and 2018

Journal of Dental Research ◽

10.1177/0022034520946025 ◽

2020 ◽

Vol 99 (13) ◽

pp. 1453-1460

Author(s):

D. Qin ◽

F. Hua ◽

H. He ◽

S. Liang ◽

H. Worthington ◽

...

Keyword(s):

Sample Size ◽

Methodological Quality ◽

Linear Models ◽

Sample Size Calculation ◽

Reporting Quality ◽

Treatment Groups ◽

Item Checklist ◽

The Mean ◽

Over Time

The objectives of this study were to assess the reporting quality and methodological quality of split-mouth trials (SMTs) published during the past 2 decades and to determine whether there has been an improvement in their quality over time. We searched the MEDLINE database via PubMed to identify SMTs published in 1998, 2008, and 2018. For each included SMT, we used the CONsolidated Standards Of Reporting Trials (CONSORT) 2010 guideline, CONSORT for within-person trial (WPT) extension, and a new 3-item checklist to assess its trial reporting quality (TRQ), WPT-specific reporting quality (WRQ), and SMT-specific methodological quality (SMQ), respectively. Multivariable generalized linear models were performed to analyze the quality of SMTs over time, adjusting for potential confounding factors. A total of 119 SMTs were included. The mean overall score for the TRQ (score range, 0 to 32), WRQ (0 to 15), and SMQ (0 to 3) was 15.77 (SD 4.51), 6.06 (2.06), and 1.12 (0.70), respectively. The primary outcome was clearly defined in only 28 SMTs (23.5%), and only 27 (22.7%) presented a replicable sample size calculation. Only 45 SMTs (37.8%) provided the rationale for using a split-mouth design. The correlation between body sites was reported in only 5 studies (4.2%) for sample size calculation and 4 studies (3.4%) for statistical results. Only 2 studies (1.7%) performed an appropriate sample size calculation, and 46 (38.7%) chose appropriate statistical methods, both accounting for the correlation among treatment groups and the clustering/multiplicity of measurements within an individual. Results of regression analyses suggested that the TRQ of SMTs improved significantly with time ( P < 0.001), while there was no evidence of improvement in WRQ or SMQ. Both the reporting quality and methodological quality of SMTs still have much room for improvement. Concerted efforts are needed to improve the execution and reporting of SMTs.

Download Full-text

Sample Size Choice: Charts for Experiments With Linear Models

Technometrics ◽

10.1080/00401706.1993.10485064 ◽

1993 ◽

Vol 35 (2) ◽

pp. 234-235

Author(s):

Michael R. Emptage

Keyword(s):

Sample Size ◽

Linear Models

Download Full-text

On power and sample size calculations for Wald tests in generalized linear models

Journal of Statistical Planning and Inference ◽

10.1016/j.jspi.2003.09.017 ◽

2005 ◽

Vol 128 (1) ◽

pp. 43-59 ◽

Cited By ~ 13

Author(s):

Gwowen Shieh

Keyword(s):

Sample Size ◽

Generalized Linear Models ◽

Linear Models ◽

Wald Tests ◽

Sample Size Calculations

Download Full-text

Measuring change in biological communities: multivariate analysis approaches for temporal datasets with low sample size

PeerJ ◽

10.7717/peerj.11096 ◽

2021 ◽

Vol 9 ◽

pp. e11096

Author(s):

Hannah L. Buckley ◽

Nicola J. Day ◽

Bradley S. Case ◽

Gavin Lear

Keyword(s):

Time Series ◽

Sample Size ◽

Statistical Power ◽

Structural Changes ◽

Linear Models ◽

Environmental Changes ◽

Short Time Series ◽

Biological Communities ◽

Extinction Events ◽

Short Time

Effective and robust ways to describe, quantify, analyse, and test for change in the structure of biological communities over time are essential if ecological research is to contribute substantively towards understanding and managing responses to ongoing environmental changes. Structural changes reflect population dynamics, changes in biomass and relative abundances of taxa, and colonisation and extinction events observed in samples collected through time. Most previous studies of temporal changes in the multivariate datasets that characterise biological communities are based on short time series that are not amenable to data-hungry methods such as multivariate generalised linear models. Here, we present a roadmap for the analysis of temporal change in short-time-series, multivariate, ecological datasets. We discuss appropriate methods and important considerations for using them such as sample size, assumptions, and statistical power. We illustrate these methods with four case-studies analysed using the R data analysis environment.

Download Full-text

Determining Sample Size in General Linear Models

Determining Sample Size and Power in Research Studies ◽

10.1007/978-981-15-5204-5_7 ◽

2020 ◽

pp. 89-119

Author(s):

J. P. Verma ◽

Priyam Verma

Keyword(s):

Sample Size ◽

Linear Models ◽

General Linear ◽

General Linear Models

Download Full-text

Characterizing Heterogeneity and Determining Sample Sizes for Accurately Estimating Wheat Fusarium Head Blight Index in Research Plots

Phytopathology ◽

10.1094/phyto-04-21-0157-r ◽

2021 ◽

Author(s):

Wanderson Bucker Moraes ◽

Laurence V Madden ◽

Pierce A. Paul

Keyword(s):

Sample Size ◽

Fusarium Head Blight ◽

Field Experiments ◽

Linear Models ◽

Spatial Scales ◽

Cluster Sampling ◽

Sample Sizes ◽

Head Blight ◽

Original Dataset ◽

Sampling Protocols

Since Fusarium head blight (FHB) intensity is usually highly variable within a plot, the number of spikes rated for FHB index (IND) quantification must be considered when designing experiments. In addition, quantification of sources of IND heterogeneity is crucial for defining sampling protocols. Field experiments were conducted to quantify the variability of IND (‘field severity’) at different spatial scales and to investigate the effects of sample size on estimated plot-level mean IND and its accuracy. A total of 216 7-row x 6-m-long plots of a moderately resistant and a susceptible cultivar were spray inoculated with different Fusarium graminearum spore concentrations at anthesis to generate a range of IND levels. A one-stage cluster sampling approach was used to estimate IND, with an average of 32 spikes rated at each of 10 equally spaced points per plot. Plot-level mean IND ranged from 0.9 to 37.9%. Heterogeneity of IND, quantified by fitting unconditional hierarchical linear models, was higher among spikes within clusters than among clusters within plots or among plots. The projected relative error of mean IND increased as mean IND decreased, and as sample size decreased below 100 spikes per plot. Simple random samples were drawn with replacement 50,000 times from the original dataset for each plot and used to estimate the effects of sample sizes on mean IND. Samples of 100 or more spikes resulted in more precise estimates of mean IND than smaller samples. Poor sampling may result in inaccurate estimates of IND and poor interpretation of results.

Download Full-text

A consistency property of the AIC for multivariate linear models when the dimension and the sample size are large

Electronic Journal of Statistics ◽

10.1214/15-ejs1022 ◽

2015 ◽

Vol 9 (1) ◽

pp. 869-897 ◽

Cited By ~ 5

Author(s):

Hirokazu Yanagihara ◽

Hirofumi Wakaki ◽

Yasunori Fujikoshi

Keyword(s):

Sample Size ◽

Linear Models ◽

Consistency Property ◽

Multivariate Linear Models

Download Full-text

Sample size calculation based on generalized linear models for differential expression analysis in RNA-seq data

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0008 ◽

2016 ◽

Vol 15 (6) ◽

Cited By ~ 1

Author(s):

Chung-I Li ◽

Yu Shyr

Keyword(s):

Sample Size ◽

Linear Models ◽

Negative Binomial ◽

Differential Expression Analysis ◽

Sample Size Calculation ◽

Rna Seq ◽

Optimal Sample Size ◽

Optimal Sample ◽

Gene Count ◽

Downstream Analysis

AbstractAs RNA-seq rapidly develops and costs continually decrease, the quantity and frequency of samples being sequenced will grow exponentially. With proteomic investigations becoming more multivariate and quantitative, determining a study’s optimal sample size is now a vital step in experimental design. Current methods for calculating a study’s required sample size are mostly based on the hypothesis testing framework, which assumes each gene count can be modeled through Poisson or negative binomial distributions; however, these methods are limited when it comes to accommodating covariates. To address this limitation, we propose an estimating procedure based on the generalized linear model. This easy-to-use method constructs a representative exemplary dataset and estimates the conditional power, all without requiring complicated mathematical approximations or formulas. Even more attractive, the downstream analysis can be performed with current R/Bioconductor packages. To demonstrate the practicability and efficiency of this method, we apply it to three real-world studies, and introduce our on-line calculator developed to determine the optimal sample size for a RNA-seq study.

Download Full-text