Sample size for foliar analyses of coastal Douglas-fir

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.

Download Full-text

Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings

Mathematics ◽

10.3390/math9060603 ◽

2021 ◽

Vol 9 (6) ◽

pp. 603

Author(s):

Leonid Hanin

Keyword(s):

Sample Size ◽

Gaussian Approximation ◽

Statistical Significance ◽

Statistical Analyses ◽

Random Sample Size ◽

P Values ◽

The Central Limit Theorem ◽

Fixed Sample ◽

Large Numbers ◽

Significance Levels

I uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of various approximations. I show through a number of examples that (a) arbitrarily small deviations from distributional homogeneity can lead to arbitrarily large deviations in the outcomes of statistical analyses; (b) samples of random size may violate the Law of Large Numbers and thus are generally unsuitable for conventional statistical inference; (c) the same is true, in particular, when random sample size and observations are stochastically dependent; and (d) the use of the Gaussian approximation based on the Central Limit Theorem has dramatic implications for p-values and statistical significance essentially making pursuit of small significance levels and p-values for a fixed sample size meaningless. The latter is proven rigorously in the case of one-sided Z test. This article could serve as a cautionary guidance to scientists and practitioners employing statistical methods in their work.

Download Full-text

What can we Learn from Studies Based on Small Sample Sizes? Comment on Regan, Lakhanpal, and Anguiano (2012)

Psychological Reports ◽

10.2466/21.02.07.pr0.113x12z8 ◽

2013 ◽

Vol 113 (1) ◽

pp. 221-224 ◽

Cited By ~ 3

Author(s):

David R. Johnson ◽

Lauren K. Bachan

Keyword(s):

Sample Size ◽

The Probability That a Measurement Falls within a Range of Standard Deviations from an Estimate of the Mean

ISRN Applied Mathematics ◽

10.5402/2012/710806 ◽

2012 ◽

Vol 2012 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Louis M. Houston

Keyword(s):

Confidence Interval ◽

Sample Size ◽

General Equation ◽

Sample Sizes ◽

The Mean ◽

Standard Deviations ◽

Intermediate Value ◽

Theoretical Results

We derive a general equation for the probability that a measurement falls within a range of n standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. The equation is derived by interpolating theoretical results for extreme sample sizes. The intermediate value of the equation is confirmed with a computational test.

Download Full-text

Methodological Reporting Behavior, Sample Sizes, and Statistical Power in Studies of Event- Related Potentials: Barriers to Reproducibility and Replicability

10.31234/osf.io/kgv9z ◽

2019 ◽

Author(s):

Peter E Clayson ◽

Kaylie Amanda Carbine ◽

Scott Baldwin ◽

Michael J. Larson

Keyword(s):

Sample Size ◽

Statistical Power ◽

Event Related Potentials ◽

Reporting Guidelines ◽

Medium Effect ◽

Sample Sizes ◽

Reporting Behavior ◽

Average Sample Size ◽

Related Potentials ◽

Average Sample

Methodological reporting guidelines for studies of event-related potentials (ERPs) were updated in Psychophysiology in 2014. These guidelines facilitate the communication of key methodological parameters (e.g., preprocessing steps). Failing to report key parameters represents a barrier to replication efforts, and difficultly with replicability increases in the presence of small sample sizes and low statistical power. We assessed whether guidelines are followed and estimated the average sample size and power in recent research. Reporting behavior, sample sizes, and statistical designs were coded for 150 randomly-sampled articles from five high-impact journals that frequently publish ERP research from 2011 to 2017. An average of 63% of guidelines were reported, and reporting behavior was similar across journals, suggesting that gaps in reporting is a shortcoming of the field rather than any specific journal. Publication of the guidelines paper had no impact on reporting behavior, suggesting that editors and peer reviewers are not enforcing these recommendations. The average sample size per group was 21. Statistical power was conservatively estimated as .72-.98 for a large effect size, .35-.73 for a medium effect, and .10-.18 for a small effect. These findings indicate that failing to report key guidelines is ubiquitous and that ERP studies are primarily powered to detect large effects. Such low power and insufficient following of reporting guidelines represent substantial barriers to replication efforts. The methodological transparency and replicability of studies can be improved by the open sharing of processing code and experimental tasks and by a priori sample size calculations to ensure adequately powered studies.

Download Full-text

No one accelerometer-based physical activity data collection protocol can fit all research questions

10.21203/rs.2.11020/v2 ◽

2019 ◽

Author(s):

Patrick Bergman ◽

Maria Hagströmer

Keyword(s):

Physical Activity ◽

Sample Size ◽

Large Sample Size ◽

Intensity Level ◽

Sample Sizes ◽

Activity Data ◽

Convenience Sample ◽

Mean Values ◽

Repeated Observations ◽

Measurement Protocol

Abstract BACKGROUND Measuring physical activity and sedentary behavior accurately remains a challenge. When describing the uncertainty of mean values or when making group comparisons, minimising Standard Error of the Mean (SEM) is important. The sample size and the number of repeated observations within each subject influence the size of the SEM. In this study we have investigated how different combinations of sample sizes and repeated observations influence the magnitude of the SEM. METHODS A convenience sample were asked to wear an accelerometer for 28 consecutive days. Based on the within and between subject variances the SEM for the different combinations of sample sizes and number of monitored days was calculated. RESULTS Fifty subjects (67% women, mean±SD age 41±19 years) were included. The analyses showed, independent of which intensity level of physical activity or how measurement protocol was designed, that the largest reductions in SEM was seen as the sample size were increased. The same magnitude in reductions to SEM was not seen for increasing the number of repeated measurement days within each subject. CONCLUSION The most effective way of reducing the SEM is to have a large sample size rather than a long observation period within each individual. Even though the importance of reducing the SEM to increase the power of detecting differences between groups is well-known it is seldom considered when developing appropriate protocols for accelerometer based research. Therefore the results presented herein serves to highlight this fact and have the potential to stimulate debate and challenge current best practice recommendations of accelerometer based physical activity research.

Download Full-text

Developing a composite outcome measure for frailty prevention trials – rationale, derivation and sample size comparison with other candidate measures

10.21203/rs.2.13602/v2 ◽

2020 ◽

Author(s):

Miles D. Witham ◽

James Wason ◽

Richard M Dodds ◽

Avan A Sayer

Keyword(s):

Sample Size ◽

Transition Probabilities ◽

Adverse Outcomes ◽

Transition Rates ◽

Composite Outcome ◽

Composite Measure ◽

Sample Sizes ◽

Loss To Follow Up ◽

Prevention Trials

Abstract Introduction Frailty is the loss of ability to withstand a physiological stressor, and is associated with multiple adverse outcomes in older people. Trials to prevent or ameliorate frailty are in their infancy. A range of different outcome measures have been proposed, but current measures require either large sample sizes, long follow-up, or do not directly measure the construct of frailty. Methods We propose a composite outcome for frailty prevention trials, comprising progression to the frail state, death, or being too unwell to continue in a trial. To determine likely event rates, we used data from the English Longitudinal Study for Ageing, collected 4 years apart. We calculated transition rates between non-frail, prefrail, frail or loss to follow up due to death or illness. We used Markov state transition models to interpolate one- and two-year transition rates, and performed sample size calculations for a range of differences in transition rates using simple and composite outcomes. Results The frailty category was calculable for 4650 individuals at baseline (2226 non-frail, 1907 prefrail, 517 frail); at follow up, 1282 were non-frail, 1108 were prefrail, 318 were frail and 1936 had dropped out or were unable to complete all tests for frailty. Transition probabilities for those prefrail at baseline, measured at wave 4 were respectively 0.176, 0.286, 0.096 and 0.442 to non-frail, prefrail, frail and dead/dropped out. Interpolated transition probabilities were 0.159, 0.494, 0.113 and 0.234 at two years, and 0.108, 0.688, 0.087 and 0.117 at one year. Required sample sizes for a two-year outcome were between 1000 and 7200 for transition from prefrailty to frailty alone, 250 to 1600 for transition to the composite measure, and 75 to 350 using the composite measure with an ordinal logistic regression approach. Conclusion Use of a composite outcome for frailty trials offers reduced sample sizes and could ameliorate the effect of high loss to follow up inherent in such trials due to death and illness.

Download Full-text

Developing a composite outcome measure for frailty prevention trials – rationale, derivation and sample size comparison with other candidate measures

10.21203/rs.2.13602/v1 ◽

2019 ◽

Author(s):

Miles D. Witham ◽

James Wason ◽

Richard M Dodds ◽

Avan A Sayer

Keyword(s):

Sample Size ◽

Transition Probabilities ◽

Adverse Outcomes ◽

Transition Rates ◽

Composite Outcome ◽

Composite Measure ◽

Sample Sizes ◽

Loss To Follow Up ◽

Prevention Trials

Abstract Introduction Frailty is the loss of ability to withstand a physiological stressor, and is associated with multiple adverse outcomes in older people. Trials to prevent or ameliorate frailty are in their infancy. A range of different outcome measures have been proposed, but current measures require either large sample sizes, long follow-up, or do not directly measure the construct of frailty. Methods We propose a composite outcome for frailty prevention trials, comprising progression to the frail state, death, or being too unwell to continue in a trial. To determine likely event rates, we used data from the English Longitudinal Study for Ageing, collected 4 years apart. We calculated transition rates between non-frail, prefrail, frail or loss to follow up due to death or illness. We used Markov state transition models to interpolate one- and two-year transition rates, and performed sample size calculations for a range of differences in transition rates using simple and composite outcomes. Results The frailty category was calculable for 4650 individuals at baseline (2226 non-frail, 1907 prefrail, 517 frail); at follow up, 1282 were non-frail, 1108 were prefrail, 318 were frail and 1936 had dropped out or were unable to complete all tests for frailty. Transition probabilities for those prefrail at baseline, measured at wave 4 were respectively 0.176, 0.286, 0.096 and 0.442 to non-frail, prefrail, frail and dead/dropped out. Interpolated transition probabilities were 0.159, 0.494, 0.113 and 0.234 at two years, and 0.108, 0.688, 0.087 and 0.117 at one year. Required sample sizes for a two-year outcome were between 1000 and 7200 for transition from prefrailty to frailty alone, 250 to 1600 for transition to the composite measure, and 75 to 350 using the composite measure with an ordinal logistic regression approach. Conclusion Use of a composite outcome for frailty trials offers reduced sample sizes and could ameliorate the effect of high loss to follow up inherent in such trials due to death and illness.

Download Full-text

Importance of sample size for estimating prevalence: A case example of infectious hematopoietic necrosis viral RNA detection in mixed-stock Fraser River Sockeye salmon (Oncorhynchus nerka), British Columbia, Canada.

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/cjfas-2020-0279 ◽

2020 ◽

Author(s):

Emilie Laurin ◽

Julia Bradshaw ◽

Laura Hawley ◽

Ian A. Gardner ◽

Kyle A Garver ◽

...

Keyword(s):

British Columbia ◽

Sample Size ◽

Sockeye Salmon ◽

Oncorhynchus Nerka ◽

Small Sample ◽

Viral Rna ◽

Sample Sizes ◽

Apparent Prevalence ◽

Infectious Hematopoietic Necrosis ◽

Mixed Stock

Proper sample size must be considered when designing infectious-agent prevalence studies for mixed-stock fisheries, because bias and uncertainty complicate interpretation of apparent (test)-prevalence estimates. Sample size varies between stocks, often smaller than expected during wild-salmonid surveys. Our case example of 2010-2016 survey data of Sockeye salmon (Oncorhynchus nerka) from different stocks of origin in British Columbia, Canada, illustrated the effect of sample size on apparent-prevalence interpretation. Molecular testing (viral RNA RT-qPCR) for infectious hematopoietic necrosis virus (IHNv) revealed large differences in apparent-prevalence across wild salmon stocks (much higher from Chilko Lake) and sampling location (freshwater or marine), indicating differences in both stock and host life-stage effects. Ten of the 13 marine non-Chilko stock-years with IHNv-positive results had small sample sizes (< 30 samples per stock-year) which, with imperfect diagnostic tests (particularly lower diagnostic sensitivity), could lead to inaccurate apparent-prevalence estimation. When calculating sample size for expected apparent prevalence using different approaches, smaller sample sizes often led to decreased confidence in apparent-prevalence results and decreased power to detect a true difference from a reference value.

Download Full-text