scholarly journals Teaching statistical appreciation in quantitative methods

2018 ◽  
Vol 16 (2) ◽  
pp. 37
Author(s):  
Peter Mitchell

Statistical appreciation is the knowledge about statistical tests, how they are chosen, the procedure and interpretation of the results, without the calculations of the test statistic.  This was taught as part of quantitative methods to students taking part-time degrees when there was insufficient time to include training on statistical computer packages.  Details of the content, teaching methods and assessment are given, with stress on correct understanding of P-values and interpretation as statistical significance.  Given that many more people need to understand the results and interpretation of statistical tests than to do the calculations, statistical appreciation is of general value, especially to research supervisors.  It also provides a firm base for further learning and training in statistics.

2018 ◽  
Author(s):  
Diana Domanska ◽  
Chakravarthi Kanduri ◽  
Boris Simovski ◽  
Geir Kjetil Sandve

AbstractBackgroundThe difficulties associated with sequencing and assembling some regions of the DNA sequence result in gaps in the reference genomes that are typically represented as stretches of Ns. Although the presence of assembly gaps causes a slight reduction in the mapping rate in many experimental settings, that does not invalidate the typical statistical testing comparing read count distributions across experimental conditions. However, we hypothesize that not handling assembly gaps in the null model may confound statistical testing of co-localization of genomic features.ResultsFirst, we performed a series of explorative analyses to understand whether and how the public genomic tracks intersect the assembly gaps track (hg19). The findings rightly confirm that the genomic regions in public genomic tracks intersect very little with assembly gaps and the intersection was observed only at the beginning and end regions of the assembly gaps rather than covering the whole gap sizes. Further, we simulated a set of query and reference genomic tracks in a way that nullified any dependence between them to test our hypothesis that not avoiding assembly gaps in the null model would result in spurious inflation of statistical significance. We then contrasted the distributions of test statistics and p-values of Monte Carlo simulation-based permutation tests that either avoided or not avoided assembly gaps in the null model when testing for significant co-localization between a pair of query and reference tracks. We observed that the statistical tests that did not account for the assembly gaps in the null model resulted in a distribution of the test statistic that is shifted to the right and a distribu tion of p-values that is shifted to the left (leading to inflated significance).ConclusionOur results shows that not accounting for assembly gaps in statistical testing of co-localization analysis may lead to false positives and over-optimistic findings.


Author(s):  
Zafar Iqbal ◽  
Lubna Waheed ◽  
Waheed Muhammad ◽  
Rajab Muhammad

Purpose: Quality Function Deployment, (QFD) is a methodology which helps to satisfy customer requirements through the selection of appropriate Technical Attributes (TAs). The rationale of this article is to provide a method lending statistical support to the selection of TAs.  The purpose is to determine the statistical significance of TAs through the derivation of associated significance (P) values.   Design/Methodology/Approach: We demonstrate our methodology with reference to an original QFD case study aimed at improving the educational system in high schools in Pakistan; and then with five further published case studies obtained from literature. Mean weights of TAs are determined. Considering each TA mean weight to be a Test Statistic, a weighted matrix is generated from the VOCs’ importance ratings, and ratings in the relationship matrix. Finally using R, P-values for the means of original TAs are determined from the hypothetical population of means of TAs.  Findings: Each TA’s P-value evaluates its significance/insignificance in terms of distance from the grand mean. P-values indirectly set the prioritization of TAs. Implications/Originality/Value: The novel aspect of this study is extension of mean weights of TAs, to also provide P-values for TAs. TAs with significant importance can be resolved on priority basis, while other can be fixed with appropriateness.


2019 ◽  
pp. 302-347
Author(s):  
Emily Finch ◽  
Stefan Fafinski

Quantitative analysis is a significant feature of criminological research. This chapter discusses quantitative methods of analysis in which statistical tests are used to describe data and to draw inferences from the data. It covers the nature of quantitative data; types of variable; univariate analysis; bivariate analysis; statistical significance; and multivariate analysis. It also includes examples of using SPSS to generate statistics and perform tests on data.


2020 ◽  
Vol 7 (2) ◽  
pp. 150
Author(s):  
Henian Chen ◽  
Yuanyuan Lu ◽  
Nicole Slye

<p class="abstract">Reporting statistical tests for baseline measures of clinical trials does not make sense since the statistical significance is dependent on sample size, as a large trial can find significance in the same difference that a small trial did not find to be statistically significant. We use 3 published trials using the same baseline measures to provide the relationship between trial sample size and p value. For trial 1 sequential organ failure assessment (SOFA) score, p=0.01, 10.4±3.4 vs. 9.6±3.2, difference=0.8; p=0.007 for vasopressors, 83.0% vs. 72.6%. Trial 2 has SOFA score 11±3 vs. 12±3, difference=1, p=0.42. Trial 3 has vasopressors 73% vs. 83%, p=0.21. Based on trial 2, supine group has a mean of 12 and an SD of 3 for SOFA score, while prone group has a mean of 11 and an SD of 3 for SOFA score. The p values are 0.29850, 0.09877, 0.01940, 0.00094, 0.00005, and &lt;0.00001 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Based on trial 3 information, the vasopressors percentages are 73.0% in the supine group vs. 83.0% in the prone group. The p values are 0.4452, 0.2274, 0.0878, 0.0158, 0.0031, and 0.0006 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Small trials provide larger p values than big trials for the same baseline differences. We cannot define the imbalance in baseline measures only based on these p values. There is no statistical basis for advocating the baseline difference tests</p>


2018 ◽  
Vol 13 (7) ◽  
pp. 669-672 ◽  
Author(s):  
Mayank Goyal ◽  
Aravind Ganesh ◽  
Scott Brown ◽  
Bijoy K Menon ◽  
Michael D Hill

The modified Rankin Scale (mRS) at 90 days after stroke onset has become the preferred outcome measure in acute stroke trials, including recent trials of interventional therapies. Reporting the range of modified Rankin Scale scores as a paired horizontal stacked bar graph (colloquially known as “Grotta bars”) has become the conventional method of visualizing modified Rankin Scale results. Grotta bars readily illustrate the levels of the ordinal modified Rankin Scale in which benefit may have occurred. However, complementing the available graphical information by including additional features to convey statistical significance may be advantageous. We propose a modification of the horizontal stacked bar graph with illustrative examples. In this suggested modification, the line joining the segments of the bar graph (e.g. modified Rankin Scale 1–2 in treatment arm to modified Rankin Scale 1–2 in control arm) is given a color and thickness based on the p-value of the result at that level (in this example, the p-value of modified Rankin Scale 0–1 vs. 2–6)—a thick green line for p-values <0.01, thin green for p-values of 0.01 to <0.05, gray for 0.05 to <0.10, thin red for 0.10 to <0.90, and thick red for p-values ≥0.90 or outcome favoring the control group. Illustrative examples from four recent trials (ESCAPE, SWIFT-PRIME, IST-3, ASTER) are shown to demonstrate the range of significant and non-significant effects that can be captured using this proposed method. By formalizing a display of outcomes which includes statistical tests of all possible dichotomizations of the Rankin scale, this approach also encourages pre-specification of such hypotheses. Prespecifying tests of all six dichotomizations of the Rankin scale provides all possible statistical information in an a priori fashion. Since the result of our proposed approach is six distinct dichotomized tests in addition to a primary test, e.g. of the ordinal Rankin shift, it may be prudent to account for multiplicity in testing by using dichotomized p-values only after adjustment, such as by the Bonferroni or Hochberg-Holm methods. Whether p-values are nominal or adjusted may be left to the discretion of the presenter as long as the presence or absence is clearly stated in the statistical methods. Our proposed modification results in a visually intuitive summary of both the size of the effect—represented by the matched bars and their connecting segments—as well as its statistical relevance.


Author(s):  
Janet L. Peacock ◽  
Philip J. Peacock

Introduction 238 Samples and populations 240 Confidence interval for a mean 242 95% confidence interval for a proportion 244 Tests of statistical significance 246 P values 248 Statistical significance and clinical significance 250 t test for two independent means 252 t test for two independent means: example ...


2020 ◽  
Vol 4 (2) ◽  
Author(s):  
Colin B Begg

Abstract Recently, a controversy has erupted regarding the use of statistical significance tests and the associated P values. Prominent academic statisticians have recommended that the use of statistical tests be discouraged or not used at all. This has naturally led to a lot of confusion among research investigators about the support in the academic statistical community for statistical methods in general. In fact, the controversy surrounding the use of P values has a long history. Critics of P values argue that their use encourages bad scientific practice, leading to the publication of far more false-positive and false-negative findings than the methodology would imply. The thesis of this commentary is that the problem is really human nature, the natural proclivity of scientists to believe their own theories and present data in the most favorable light. This is strongly encouraged by a celebrity culture that is fueled by academic institutions, the scientific journals, and the media. The importance of the truth-seeking tradition of the scientific method needs to be reinforced, and this is being helped by current initiatives to improve transparency in science and to encourage reproducible and replicable research. Statistical testing, used correctly, has an important and valuable place in the scientific tradition.


2020 ◽  
Vol 132 (6) ◽  
pp. 1970-1976
Author(s):  
Ashwin G. Ramayya ◽  
H. Isaac Chen ◽  
Paul J. Marcotte ◽  
Steven Brem ◽  
Eric L. Zager ◽  
...  

OBJECTIVEAlthough it is known that intersurgeon variability in offering elective surgery can have major consequences for patient morbidity and healthcare spending, data addressing variability within neurosurgery are scarce. The authors performed a prospective peer review study of randomly selected neurosurgery cases in order to assess the extent of consensus regarding the decision to offer elective surgery among attending neurosurgeons across one large academic institution.METHODSAll consecutive patients who had undergone standard inpatient surgical interventions of 1 of 4 types (craniotomy for tumor [CFT], nonacute redo CFT, first-time spine surgery with/without instrumentation, and nonacute redo spine surgery with/without instrumentation) during the period 2015–2017 were retrospectively enrolled (n = 9156 patient surgeries, n = 80 randomly selected individual cases, n = 20 index cases of each type randomly selected for review). The selected cases were scored by attending neurosurgeons using a need for surgery (NFS) score based on clinical data (patient demographics, preoperative notes, radiology reports, and operative notes; n = 616 independent case reviews). Attending neurosurgeon reviewers were blinded as to performing provider and surgical outcome. Aggregate NFS scores across various categories were measured. The authors employed a repeated-measures mixed ANOVA model with autoregressive variance structure to compute omnibus statistical tests across the various surgery types. Interrater reliability (IRR) was measured using Cohen’s kappa based on binary NFS scores.RESULTSOverall, the authors found that most of the neurosurgical procedures studied were rated as “indicated” by blinded attending neurosurgeons (mean NFS = 88.3, all p values < 0.001) with greater agreement among neurosurgeon raters than expected by chance (IRR = 81.78%, p = 0.016). Redo surgery had lower NFS scores and IRR scores than first-time surgery, both for craniotomy and spine surgery (ANOVA, all p values < 0.01). Spine surgeries with fusion had lower NFS scores than spine surgeries without fusion procedures (p < 0.01).CONCLUSIONSThere was general agreement among neurosurgeons in terms of indication for surgery; however, revision surgery of all types and spine surgery with fusion procedures had the lowest amount of decision consensus. These results should guide efforts aimed at reducing unnecessary variability in surgical practice with the goal of effective allocation of healthcare resources to advance the value paradigm in neurosurgery.


Sign in / Sign up

Export Citation Format

Share Document