XXXIV. The other side of ‘statistical significance’: alpha, beta. delta, and the calculation of sample size

1975 ◽  
Vol 18 (4) ◽  
pp. 491-505 ◽  
Author(s):  
Alvan R. Feinstein
2021 ◽  
pp. 23-37
Author(s):  
Piotr Pawlak

This text is a report from research on selected aspects of the crisis of democracy in Poland. The problem of the analysis focuses on questions about the attitude of the participants in the political dispute towards opponents, the possibility (and method) of reaching an agreement and the assessment of the situation. I chose the area of analysis considering social media a transparent platform for political dispute (especially during the SARS-CoV-2 pandemic). The essence of the research was to find respondents clearly involved in the political dispute. I choose Facebook, among the other things, because this platform creates a place of many actively operating thematic groups which gathers supporters of particular political options. This fact facilitate reaching respondents with the preferred characteristics. The survey was conducted in the period 13/12/2020 to 25/01/2021 on the basis of the author’s form consisting of 15 questions. The research sample consists of 220 respondents: 126 women and 94 men aged 14 to 72 years old. Appropriately for the nature of the variables and the sample size, the contingency coefficient and the Kruskal-Wallis test were calculated, together with the statistical significance of the obtained results. All analyzes were performed on the basis of SPSS software version 26 and Microsoft Excel.


Mathematics ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 603
Author(s):  
Leonid Hanin

I uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of various approximations. I show through a number of examples that (a) arbitrarily small deviations from distributional homogeneity can lead to arbitrarily large deviations in the outcomes of statistical analyses; (b) samples of random size may violate the Law of Large Numbers and thus are generally unsuitable for conventional statistical inference; (c) the same is true, in particular, when random sample size and observations are stochastically dependent; and (d) the use of the Gaussian approximation based on the Central Limit Theorem has dramatic implications for p-values and statistical significance essentially making pursuit of small significance levels and p-values for a fixed sample size meaningless. The latter is proven rigorously in the case of one-sided Z test. This article could serve as a cautionary guidance to scientists and practitioners employing statistical methods in their work.


2014 ◽  
Vol 2014 ◽  
pp. 1-5 ◽  
Author(s):  
Giovanni Ciancio ◽  
Stefania Volpinari ◽  
Maria Fotinidi ◽  
Federica Furini ◽  
Ilaria Farina ◽  
...  

Objective. To evaluate the involvement of the bursa located next to the head of the 5th metatarsal bone in patients with psoriatic arthritis (PsA) in comparison with the other seronegative spondyloarthritis (SpA).Methods. All patients with PsA seen during a period of 24 months were enrolled. The control group included healthy subjects and patients with the other SpA. All subjects underwent clinical and ultrasound (US) examination of the lateral surface of the 5th metatarsal.Results. 150 PsA patients (88 M; 62 F), 172 SpA (107 M; 65 F), and 95 healthy controls (58 M; 37 F) were evaluated. Based on clinical and US evaluation, bursitis was diagnosed in 17/150 (11.3%) PsA patients but in none of the SpA (P<0.0001) and healthy (P=0.0002) controls. In detecting bursitis, US was more sensitive than clinical examination, although the difference did not reach statistical significance (P=0.09).Conclusion. The bursa of the 5th metatarsophalangeal joint appears to be involved in PsA more frequently than by chance. If confirmed by other studies, this finding could be considered as a distinctive clinical sign of PsA, useful for differential diagnosis with the other SpA. In asymptomatic patients, US proved to be more sensitive in the detection of bursitis.


2016 ◽  
Vol 11 (4) ◽  
pp. 551-554 ◽  
Author(s):  
Martin Buchheit

The first sport-science-oriented and comprehensive paper on magnitude-based inferences (MBI) was published 10 y ago in the first issue of this journal. While debate continues, MBI is today well established in sport science and in other fields, particularly clinical medicine, where practical/clinical significance often takes priority over statistical significance. In this commentary, some reasons why both academics and sport scientists should abandon null-hypothesis significance testing and embrace MBI are reviewed. Apparent limitations and future areas of research are also discussed. The following arguments are presented: P values and, in turn, study conclusions are sample-size dependent, irrespective of the size of the effect; significance does not inform on magnitude of effects, yet magnitude is what matters the most; MBI allows authors to be honest with their sample size and better acknowledge trivial effects; the examination of magnitudes per se helps provide better research questions; MBI can be applied to assess changes in individuals; MBI improves data visualization; and MBI is supported by spreadsheets freely available on the Internet. Finally, recommendations to define the smallest important effect and improve the presentation of standardized effects are presented.


2004 ◽  
Vol 96 (4) ◽  
pp. 1277-1284 ◽  
Author(s):  
Roy L. P. G. Jentjens ◽  
Luke Moseley ◽  
Rosemary H. Waring ◽  
Leslie K. Harding ◽  
Asker E. Jeukendrup

The purpose of the present study was to examine whether combined ingestion of a large amount of fructose and glucose during cycling exercise would lead to exogenous carbohydrate oxidation rates >1 g/min. Eight trained cyclists (maximal O2consumption: 62 ± 3 ml·kg-1·min-1) performed four exercise trials in random order. Each trial consisted of 120 min of cycling at 50% maximum power output (63 ± 2% maximal O2consumption), while subjects received a solution providing either 1.2 g/min of glucose (Med-Glu), 1.8 g/min of glucose (High-Glu), 0.6 g/min of fructose + 1.2 g/min of glucose (Fruc+Glu), or water. The ingested fructose was labeled with [U-13C]fructose, and the ingested glucose was labeled with [U-14C]glucose. Peak exogenous carbohydrate oxidation rates were ∼55% higher ( P < 0.001) in Fruc+Glu (1.26 ± 0.07 g/min) compared with Med-Glu and High-Glu (0.80 ± 0.04 and 0.83 ± 0.05 g/min, respectively). Furthermore, the average exogenous carbohydrate oxidation rates over the 60- to 120-min exercise period were higher ( P < 0.001) in Fruc+Glu compared with Med-Glu and High-Glu (1.16 ± 0.06, 0.75 ± 0.04, and 0.75 ± 0.04 g/min, respectively). There was a trend toward a lower endogenous carbohydrate oxidation in Fruc+Glu compared with the other two carbohydrate trials, but this failed to reach statistical significance ( P = 0.075). The present results demonstrate that, when fructose and glucose are ingested simultaneously at high rates during cycling exercise, exogenous carbohydrate oxidation rates can reach peak values of ∼1.3 g/min.


2011 ◽  
Vol 6 (2) ◽  
pp. 252-277 ◽  
Author(s):  
Stephen T. Ziliak

AbstractStudent's exacting theory of errors, both random and real, marked a significant advance over ambiguous reports of plant life and fermentation asserted by chemists from Priestley and Lavoisier down to Pasteur and Johannsen, working at the Carlsberg Laboratory. One reason seems to be that William Sealy Gosset (1876–1937) aka “Student” – he of Student'st-table and test of statistical significance – rejected artificial rules about sample size, experimental design, and the level of significance, and took instead an economic approach to the logic of decisions made under uncertainty. In his job as Apprentice Brewer, Head Experimental Brewer, and finally Head Brewer of Guinness, Student produced small samples of experimental barley, malt, and hops, seeking guidance for industrial quality control and maximum expected profit at the large scale brewery. In the process Student invented or inspired half of modern statistics. This article draws on original archival evidence, shedding light on several core yet neglected aspects of Student's methods, that is, Guinnessometrics, not discussed by Ronald A. Fisher (1890–1962). The focus is on Student's small sample, economic approach to real error minimization, particularly in field and laboratory experiments he conducted on barley and malt, 1904 to 1937. Balanced designs of experiments, he found, are more efficient than random and have higher power to detect large and real treatment differences in a series of repeated and independent experiments. Student's world-class achievement poses a challenge to every science. Should statistical methods – such as the choice of sample size, experimental design, and level of significance – follow the purpose of the experiment, rather than the other way around? (JEL classification codes: C10, C90, C93, L66)


2021 ◽  
Vol 10 (36) ◽  
pp. 115-118
Author(s):  
Érika Cristina Ferreira ◽  
Paula Fernanda Massini ◽  
Caroline Felicio Braga ◽  
Ricardo Nascimento Drozino ◽  
Neide Martins Moreira ◽  
...  

Introduction: Toxoplasmosis is a zoonosis that represents a serious public health problem, caused by Toxoplasma gondii, which affects 20-90% of the world human population [1,2]. It is a serious problem especially when considering the congenital transmission due to congenital sequels. Treatment with highly diluted substances is one of the alternative/complementary medicines most employed in the world [3,4]. The current ethical rules regarding the number of animals used in animal experimental protocols with the use of more conservative statistical methods [5] can not enhance the biological effects of highly diluted substances observed by the experience of the researcher. Aim: To evaluate the minimum number of animals per group to achieve a significant difference among the groups of animals treated with biotherapic T. gondii and infected with the protozoan regarding the number of cysts observed in the brain. Material and methods: A blind randomized controlled trial was performed using eleven Swiss male mice, aged 57 days, divided into two groups: BIOT-200DH - treated with biotherapic (n=6) and CONTROL - treated with hydroalcoholic solution 7% (n=7).The animals of the group BIOT-200DH were treated for 3 consecutive days in a single dose 0.1ml/dose/day. The animals of BIOT – 200DH group were orally infected with 20 cysts of ME49-T. gondii. The animals of the control group were treated with cereal alcohol 7% (n=7) for 3 consecutive days and then were infected with 20 cysts of ME49 -T. gondii orally. The biotherapic 200DH T. gondii was prepared with homogenized mouse brain, with 20 cysts of T. gondii / 100μL according to the Brazilian Homeopathic Pharmacopoeia [6] in laminar flow. After 60 days post-infection the animals were killed in a chamber saturated with halothane, the brains were homogenized and resuspended in 1 ml of saline solution. Cysts were counted in 25 ml of this suspension, covered with a 24x24 mm coverglass, examined in its full length. This study was approved by the Ethics Committee for animal experimentation of the UEM - Protocol 036/2009. The data were compared using the tests Mann Whitney and Bootstrap [7] with the statistical software BioStat 5.0. Results and discussion: There was no significant difference when analyzed with the Mann-Whitney, even multiplying the "n" ten times (p=0.0618). The number of cysts observed in BIOT 200DH group was 4.5 ± 3.3 and 12.8 ± 9.7 in the CONTROL group. Table 1 shows the results obtained using the bootstrap analysis for each data changed from 2n until 2n+5, and their respective p-values. With the inclusion of more elements in the different groups, tested one by one, randomly, increasing gradually the samples, we observed the sample size needed to statistically confirm the results seen experimentally. Using 17 mice in group BIOT 200DH and 19 in the CONTROL group we have already observed statistical significance. This result suggests that experiments involving highly diluted substances and infection of mice with T. gondii should work with experimental groups with 17 animals at least. Despite the current and relevant ethical discussions about the number of animals used for experimental procedures the number of animals involved in each experiment must meet the characteristics of each item to be studied. In the case of experiments involving highly diluted substances, experimental animal models are still rudimentary and the biological effects observed appear to be also individualized, as described in literature for homeopathy [8]. The fact that the statistical significance was achieved by increasing the sample observed in this trial, tell us about a rare event, with a strong individual behavior, difficult to demonstrate in a result set, treated simply with a comparison of means or medians. Conclusion: Bootstrap seems to be an interesting methodology for the analysis of data obtained from experiments with highly diluted substances. Experiments involving highly diluted substances and infection of mice with T. gondii should be better work with experimental groups using 17 animals at least.


2013 ◽  
Vol 12 (3) ◽  
pp. 345-351 ◽  
Author(s):  
Jessica Middlemis Maher ◽  
Jonathan C. Markey ◽  
Diane Ebert-May

Statistical significance testing is the cornerstone of quantitative research, but studies that fail to report measures of effect size are potentially missing a robust part of the analysis. We provide a rationale for why effect size measures should be included in quantitative discipline-based education research. Examples from both biological and educational research demonstrate the utility of effect size for evaluating practical significance. We also provide details about some effect size indices that are paired with common statistical significance tests used in educational research and offer general suggestions for interpreting effect size measures. Finally, we discuss some inherent limitations of effect size measures and provide further recommendations about reporting confidence intervals.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 3620-3620
Author(s):  
Sule Unal ◽  
Neslihan Kalkan ◽  
Mualla Cetin ◽  
Fatma Gumruk

Abstract Introduction: Iron overload is one of themajor complicationsof transfusion treatment in patient with thalassemia major. Deferasirox is a once-daily orally active iron chelator and long-term efficacy and safety data are being published. Herein we report the long-term follow-up data of thalassemia major patients in a single center. Methods: Of the 67 patients with thalassemia major who were under follow-up in a single center, 42 who were on deferasirox chelation for at least three years were included in the study. Patients' initial serum ferritin, ALT, creatinine, cardiac T2* and hepatic T2* values were recorded at the time of deferasirox initiation and at last visit. Deferasirox was not initiated as an iron chelator to none of the patients with a cardiac T2* value below 8 ms. All of the patients had creatinine clearance above 40 ml/minute and had serum creatinine levels within age appropriate normals at deferasirox initiation. None of the patients received any other chelations during the follow-up period. Results: Mean age of the patients were 16±9.4 years (2-33.4 years) at initiation of deferasirox and 22 (52%) were females. Eighteen (43%) of the patients were splenectomized. Median follow-up time of deferasirox chelation was 7.9 years (3-10). The median deferasirox doses at initiation of chelation and at last visit were 20.5 mg/kg/day and 30.7 mg/kg/day (7-40), respectively. Serum ferritin levels decreased significantly with deferasirox chelation (median 1969 ng/ml (516-5404) vs 1113 ng/ml (339-4003), p<0,001). We did not find statistically significant difference between the inital cardiac T2* values and the values at the last visit (median 25 .3 ms((8.7-42) vs 32 ms (6.6-42), p=0.607), despite a dramatic increase. On the other hand, hepatic T2* values did not significantly change compared to initial values, as well (median 3.7 ms (1-13.6) vs 3.3 (1-16), p=0.865). However of the patients who had cardiac T2* value between 10-20 ms, 67% was found to have T2* value above 20 ms by the end of the follow-up duration. On the other hand 53% of the patients with hepatic T2* value below 3.5 ms, had T2* values above 3.5 ms by the end of the follow-up, indicating improvement in iron stores. None of the patients exibited an adverse event that requires cessation of the drug totally, but patients exibited transient hypertransaminasemia that required transient cessation and/or dose decrement. The changes in serum ALT and serum creatinine levels at the initiation and at last visit were not significant. Conclusions: This is a a study that includes patients with a relatively long duration of follow-up. Although the cardiac T2* values improved by the end of the follow-up, this change was not found statistically significant. This can be attributed to the sample size and in a larger sample size, the change might be found significant. Additionally, the patients included in the study were composed of not only naive patients to chelation but also of the patients who were imcomplant to previous chelation and who were highly iron loaded before initiation of deferasirox. Disclosures No relevant conflicts of interest to declare.


2007 ◽  
Vol 2 (2) ◽  
pp. 97
Author(s):  
Wendy Furlan

A review of: Shachaf, Pnina, and Sarah Horowitz. "Are Virtual Reference Services Color Blind?" Library & Information Science Research 28.4 (Sept. 2006): 501-20. Abstract Objective – To examine whether librarians provide equitable virtual reference services to diverse user groups. Design – Unobtrusive method of defined scenarios submitted via e-mail. Setting – Twenty-three Association of Research Libraries (ARL) member libraries from across the United States. All ARL member libraries were invited to participate, with the 23 acceptances providing 19% participation. Subjects – Anonymous librarians from the 23 participating libraries’ virtual e-mail reference services. Up to 6 librarians from each library may have been involved. Six fictitious personas were developed to represent particular ethnic or religious groups, whereby the ethnic or religious affiliation was only indicated by the name chosen for each user and the corresponding e-mail address. Names were selected from lists of names or baby names available online: Latoya Johnson (African-American), Rosa Manuz (Hispanic), Chang Su (Asian - Chinese), Mary Anderson (Caucasian/Christian), Ahmed Ibrahim (Muslim), and Moshe Cohen (Caucasian/Jewish). These personas were used to submit reference queries via e-mail to the virtual reference services taking part in the study. Methods – Five different types of reference queries were developed for use in this study. Three were based on prior published research as they were deemed to be answerable by the majority of libraries. They included a dissertation query, a sports team query, and a population query all designed to be tailored to the target institution. The other 2 queries were developed with participating institutions’ virtual reference guidelines in mind, and were thought to not be answered by the target institutions when submitted by unaffiliated users. They consisted of a subject query on a special collection topic that asked for copies of relevant articles to be sent out, and an article query requesting that a copy of a specific article be e-mailed to the patron. The study was conducted over a 6 week period beginning the second week of September, 2005. Each week, 1 fictitious persona was used to e-mail a reference query to the virtual reference service of each of the 23 participating institutions. Five of each type of query were sent by each persona. During September and October 2005, a total of 138 queries were sent. Each institution received a different query for each of the first 5 weeks, and in the sixth week they received a repeat of a previous request with details of title or years altered. All other text in every request sent was kept consistent. Each institution only received 1 request from each persona during the study. In order to eliminate any study bias caused by an informed decision regarding the order in which personas were used, they were randomly arranged (alphabetically by surname). Furthermore, to avoid suspicions from responding librarians, queries were e-mailed on different days of the week at different times. This created some limitations in interpretating response times as some queries were submitted on weekends. All queries were analysed by Nvivo software in order to identify attributes and patterns to aid qualitative analysis. Each transaction (a single query and any related responses) was classified according to 12 attributes and 59 categories based on various associations’ digital reference guidelines. Transactions were coded and then 10% re-coded by a different coder. This led to the clarification and refinement of the coding scheme, resulting in the number of categories used being reduced to 23. Coding was then performed in 3 iterations until 90% agreement between the 2 coders was reached. The final inter-coder reliability was 92%. The study did not support cross tabulation among user groups on most content categories due to the small sample size. Main results – Response times varied greatly between users. Moshe (Caucasian/Jewish) received an average turn-around of less than a day. At the other end of the spectrum, Ahmed’s (Muslim) responses took an average of 3.5 days. Both Ahmed and Latoya (African-American) sent queries which took over 18 days to receive a response. The length (number of words) of replies also indicated a differing level of service with Mary (Caucasian/Christian) and Moshe receiving far lengthier responses than the other 4 personas. Number of replies (including automatic replies) was examined in comparison with the number of replies which answered the question, and again indicated Mary and Moshe were receiving a better level of service. The way in which the user was addressed by the librarian was examined as another measure of service, i.e. first name, full name, honorific. This again mirrored the low level of service received by Ahmed. The professional endings used by librarians in their replies also reinforced the high quality of service received by Moshe across other categories. Results for Rosa (Latino) and Chang (Asian - Chinese) were average for most categories presented. Conclusion – In this study, a discriminatory pattern was clearly evident, with the African-American and Muslim users receiving poor levels of service from virtual reference librarians across all dimensions of quality evaluated. The Caucasian (Christian and Jewish) users also noticeably received the best level of service. It is noted, however, that the sample size of the study is not large enough for generalisations to be drawn and that future, more statistically significant studies are warranted. Many other questions are raised by the study for possible future research into racism exhibited by library staff and services.


Sign in / Sign up

Export Citation Format

Share Document