Inferential, Nonparametric Statistics to Assess the Quality of Probabilistic Forecast Systems

Alinede H. N. Maia; Holger Meinke; Sarah Lennox; Roger Stone

doi:10.1175/mwr3291.1

Inferential, Nonparametric Statistics to Assess the Quality of Probabilistic Forecast Systems

Monthly Weather Review ◽

10.1175/mwr3291.1 ◽

2007 ◽

Vol 135 (2) ◽

pp. 351-362 ◽

Cited By ~ 16

Author(s):

Alinede H. N. Maia ◽

Holger Meinke ◽

Sarah Lennox ◽

Roger Stone

Keyword(s):

Southern Oscillation ◽

Statistical Tests ◽

Quality Measures ◽

Nonparametric Tests ◽

P Values ◽

Skill Scores ◽

Significance Levels ◽

Nonparametric Statistical ◽

Forecast Quality

Abstract Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must also provide some quantitative evidence of “quality.” However, the quality of statistical climate forecast systems (forecast quality) is an ill-defined and frequently misunderstood property. Often, providers and users of such forecast systems are unclear about what quality entails and how to measure it, leading to confusion and misinformation. A generic framework is presented that quantifies aspects of forecast quality using an inferential approach to calculate nominal significance levels (p values), which can be obtained either by directly applying nonparametric statistical tests such as Kruskal–Wallis (KW) or Kolmogorov–Smirnov (KS) or by using Monte Carlo methods (in the case of forecast skill scores). Once converted to p values, these forecast quality measures provide a means to objectively evaluate and compare temporal and spatial patterns of forecast quality across datasets and forecast systems. The analysis demonstrates the importance of providing p values rather than adopting some arbitrarily chosen significance levels such as 0.05 or 0.01, which is still common practice. This is illustrated by applying nonparametric tests (such as KW and KS) and skill scoring methods [linear error in the probability space (LEPS) and ranked probability skill score (RPSS)] to the five-phase Southern Oscillation index classification system using historical rainfall data from Australia, South Africa, and India. The selection of quality measures is solely based on their common use and does not constitute endorsement. It is found that nonparametric statistical tests can be adequate proxies for skill measures such as LEPS or RPSS. The framework can be implemented anywhere, regardless of dataset, forecast system, or quality measure. Eventually such inferential evidence should be complemented by descriptive statistical methods in order to fully assist in operational risk management.

Download Full-text

Nonparametric Statistics on the Computer

Journal of Marketing Research ◽

10.1177/002224376900600110 ◽

1969 ◽

Vol 6 (1) ◽

pp. 86-92

Author(s):

John Morris

Keyword(s):

Missing Data ◽

Statistical Tests ◽

Computer Programs ◽

Nonparametric Statistics ◽

Nonparametric Tests ◽

Statistical System ◽

Nonparametric Statistical

The nonparametric statistical system is a package of computer programs for use with data that may not meet the assumptions of more traditional statistical tests. Thirty-four nonparametric tests are available in the system. Provisions for missing data, variable formats, and other options make the system potentially useful in research based on attitude questionnaires.

Download Full-text

Does Religiosity Help Muslims Adjust to Death?: A Research Note

OMEGA - Journal of Death and Dying ◽

10.2190/om.57.1.f ◽

2008 ◽

Vol 57 (1) ◽

pp. 113-119 ◽

Cited By ~ 6

Author(s):

Mohammad Samir Hossain ◽

Mohammad Zakaria Siddique

Keyword(s):

End Of Life ◽

Correlation Coefficient ◽

Statistical Tests ◽

Stratified Sampling ◽

Cross Sectional ◽

P Values ◽

Age Range ◽

The Impact ◽

Minimum Education

Death is the end of life. But Muslims believe death is an event between two lives, not an absolute cessation of life. Thus religiosity may influence Muslims differently about death. To explore the impact of religious perception, thus religiosity, a cross-sectional, descriptive, analytic and correlational study was conducted on 150 Muslims. Self-declared healthy Muslims equally from both sexes ( N = 150, Age range – 20 to 50 years, Minimum education – Bachelor) were selected by stratified sampling and randomly under each stratum. Subjects, divided in five levels of religiosity, were assessed and scored for the presence of maladjustment symptoms and stage of adjustment with death. ANOVA and correlation coefficient was applied on the sets of data collected. All statistical tests were done at the level of 95% confidence ( P < 0.05). Final results were higher than the table values used for ANOVA and correlation coefficient yielded P values of < 0.05, < 0.01, and < 0.001. Religiosity as a criterion of Muslims influenced the quality of adjustment with death positively. So we hypothesized that religiosity may help Muslims adjust to death.

Download Full-text

A Review of Statistical Reporting in Dietetics Research (2010–2019): How is a Canadian Journal Doing?

Canadian Journal of Dietetic Practice and Research ◽

10.3148/cjdpr-2021-005 ◽

2021 ◽

pp. 1-9

Author(s):

Holly Schaafsma ◽

Holly Laasanen ◽

Jasna Twynstra ◽

Jamie A. Seabrook

Keyword(s):

Sample Size ◽

Canadian Journal ◽

Quantitative Research ◽

Research Team ◽

Statistical Tests ◽

Sample Size Calculation ◽

Future Research ◽

Statistical Techniques ◽

P Values

Despite the widespread use of statistical techniques in quantitative research, methodological flaws and inadequate statistical reporting persist. The objective of this study is to evaluate the quality of statistical reporting and procedures in all original, quantitative articles published in the Canadian Journal of Dietetic Practice and Research (CJDPR) from 2010 to 2019 using a checklist created by our research team. In total, 107 articles were independently evaluated by 2 raters. The hypothesis or objective(s) was clearly stated in 97.2% of the studies. Over half (51.4%) of the articles reported the study design and 57.9% adequately described the statistical techniques used. Only 21.2% of the studies that required a prestudy sample size calculation reported one. Of the 281 statistical tests conducted, 88.3% of them were correct. P values >0.05–0.10 were reported as “statistically significant” and/or a “trend” in 11.4% of studies. While this evaluation reveals both strengths and areas for improvement in the quality of statistical reporting in CJDPR, we encourage dietitians to pursue additional statistical training and/or seek the assistance of a statistician. Future research should consider validating this new checklist and using it to evaluate the statistical quality of studies published in other nutrition journals and disciplines.

Download Full-text

Enlarging the Scope of Randomization and Permutation Tests in Neuroimaging and Neuroscience

10.1101/685560 ◽

2019 ◽

Cited By ~ 1

Author(s):

Eric Maris

Keyword(s):

Rate Control ◽

Explanatory Variable ◽

Statistical Tests ◽

Permutation Tests ◽

Informed Choice ◽

Companion Paper ◽

Nonparametric Tests ◽

Test Statistic ◽

Drastic Increase ◽

Nonparametric Statistical

AbstractEspecially for the high-dimensional data collected in neuroscience, nonparametric statistical tests are an excellent alternative for parametric statistical tests. Because of the freedom to use any function of the data as a test statistic, nonparametric tests have the potential for a drastic increase in sensitivity by making a biologically-informed choice for a test statistic. In a companion paper (Geerligs & Maris, 2020), we demonstrate that such a drastic increase is actually possible. This increase in sensitivity is only useful if, at the same time, the false alarm (FA) rate can be controlled. However, for some study types (e.g., within-participant studies), nonparametric tests do not control the FA rate (see Eklund, Nichols, & Knutsson, 2016). In the present paper, we present a family of nonparametric randomization and permutation tests of which we prove exact FA rate control. Crucially, these proofs hold for a much larger family of study types than before, and they include both within-participant studies and studies in which the explanatory variable is not under experimental control. The crucial element of this statistical innovation is the adoption of a novel but highly relevant null hypothesis: statistical independence between the biological and the explanatory variable.

Download Full-text

Checking the quality of approximation of p-values in statistical tests for random number generators by using a three-level test

Mathematics and Computers in Simulation ◽

10.1016/j.matcom.2018.08.005 ◽

2019 ◽

Vol 161 ◽

pp. 66-75 ◽

Cited By ~ 2

Author(s):

Hiroshi Haramoto ◽

Makoto Matsumoto

Keyword(s):

Random Number ◽

Statistical Tests ◽

Random Number Generators ◽

P Values ◽

Quality Of Approximation

Download Full-text

Neuropsychological and Physical Trajectories in Neurotypical and High-cognitive Performing Older Adults

Journal of Geriatric Medicine ◽

10.30564/jgm.v3i2.3602 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Alessandro Amorim Aita ◽

Corina Satler ◽

Henrique Salmazo Da silva ◽

Isabelle Patriciá Freitas Soares Chariglione

Keyword(s):

Older Adults ◽

Cognitive Performance ◽

High Performance ◽

Statistical Tests ◽

Well Being ◽

High Performing ◽

One Year ◽

Nonparametric Statistical ◽

Anxiety Depression

The maintenance of high cognitive performance in old age has increasinglybecome a public health interest due to associations between cognition,well-being, longevity, and autonomy. The objective of the researchis to investigate cognitive, physical, and psychological trajectories ofneurotypical older adults (NOAs) and high performing older adults(HPOAs). An exploratory study to investigate 21 NOAs and six HPOAs(mean age 71, SD = ± 3.59), followed up for one year. The older adultswere submitted to physical fitness, quality of life, anxiety, depression,RAVLT, ACE-R, and Stroop tests, being assessed at three moments:baseline, six months after the cognitive (MEMO) or stimulation (Stimullus)interventions, and six months after the multimodal interventions, whichcould be physical or psychopedagogical interventions (health educationlectures). Nonparametric statistical tests (Mann-Whitney and Wilcoxon)were performed with p≤0.05. The results demonstrated that the cognitivemeasures were good predictors of cognitive performance and we observedpositive correlations between cognitive and mood measures. The olderadults with high performance had a lower prevalence of depressivesymptoms. There were gains in global cognitive performance, mood, and inphysical fitness variables associated with multimodal interventions, evidentin the neurotypical group

Download Full-text

Assessment Of Dry Eye Symptoms And Quality Of Sleep In Engineering Students During The Covid-19 Pandemic

International Journal of Research in Pharmaceutical Sciences ◽

10.26452/ijrps.v11ispl1.3593 ◽

2020 ◽

Vol 11 (SPL1) ◽

pp. 1202-1207

Author(s):

Pavithra S ◽

Dheepak Sundar M

Keyword(s):

Sleep Quality ◽

Dry Eye ◽

Screen Time ◽

Engineering Students ◽

Statistical Tests ◽

Quality Of Sleep ◽

Cross Sectional ◽

Eye Symptoms ◽

Dry Eye Symptoms

To assess dry eye symptoms (DES) and quality of sleep in engineering students during the Covid19 pandemic lockdown and also to assess the association between DES and sleep quality. A cross-sectional questionnaire-based study was carried out among 396 engineering students studying in Saveetha engineering college. The study tool used was a semi-structured google form questionnaire designed for assessing digital device usage, symptoms of dry eye disease and sleep pattern. Responses were analyzed using appropriate statistical tests. Overall 64.1% attained a score of more than 10, indicating the presence of DES. 70.2% of the study population used digital screens for more than 13 hours. A statistically significant association was found between increased screen time and presence of DES(p<0.05). 64.9% had a score of >18 indicating reduced sleep quality. About 77.1% of the students with DES had reduced sleep quality, and a significant association (p<0.01) was observed between the two. During the Covid19 pandemic lockdown, there appears to be rising prevalence of DES in student population, one of the reasons being increased screen time. The sleep quality was also found to be reduced, and a significant association was found between DES and sleep quality.

Download Full-text

Efektivitas Komunikasi Pendamping Corporate Social Responsibility (CSR)

Jurnal Sains Komunikasi dan Pengembangan Masyarakat [JSKPM] ◽

10.29244/jskpm.v4i6.749 ◽

2020 ◽

Vol 4 (6) ◽

pp. 894

Author(s):

Yara Falmira Dianira

Keyword(s):

Statistical Tests ◽

Rank Correlation ◽

Communication Effectiveness ◽

Message Content ◽

Sources Of Information ◽

Source Information ◽

Spearman Rank Correlation ◽

Level Of Understanding ◽

Corporate Social

ABSTRACT An important factor for the success of a CSR program is effective communication. Communication will be effective if it has an impact. If the information is conveyed based on the needs, then the communication will be effective. This study aims to analyze the factors which are related to the effectiveness of CSR communication. This study used a census method to approach 37 participants who received CSR programs. The Data analysis used the Spearman rank correlation for the statistical tests. The results showed that there was a correlation between factors that have the strength of CSR companion communication (level of attractiveness of the companion, quality of message content, and sources of information) which have real communication at the level of understanding of the participants of the Kertajaya Creative Destination (KCD) CSR program. In addition, there is a real correlation the factors that have the strength of CSR companion communication (the level of credibility of the companion, the source information, and the level of the recipient) and having communication at the level of attitudes of participants in the Kertajaya Creative Destination (KCD) CSR program. However, there is no real correlation between CSR companion communication factors and participant actions.Keywords :communication effectiveness, CSR, elements of communication. ABSTRAK Faktor penting dari keberhasilan program CSR adalah komunikasi yang efektif. Komunikasi dikatakan efektif jika menimbulkan dampak. Bila informasi tersampaikan sesuai dengan kebutuhan, maka komunikasi yang dijalankan efektif. Penelitian ini bertujuan untuk menganalisis efektivitas komunikasi pendamping CSR. Penelitian ini menggunakan pendekatan sensus terhadap 37 orang peserta penerima program CSR. Analisis data menggunakan uji statistik korelasi rank Spearman. Hasil penelitian menunjukkan bahwa terdapat hubungan nyata antara faktor efektivitas komunikasi pendamping CSR (derajat daya tarik pendamping, kualaitas isi pesan, dan sumber informasi) dengan efektivitas komunikasi pada tingkat pemahaman peserta program CSR Kertajaya Creative Destination (KCD). Selain itu, terdapat hubungan nyata antara faktor efektivitas komunikasi pendamping CSR (tingkat kredibilitas pendamping, sumber informasi, dan tingkat penerima) dengan efektivitas komunikasi pada tingkat sikap peserta program CSR Kertajaya Creative Destination (KCD). Namun, tidak terdapat hubungan nyata antara faktor efektivitas komunikasi pendamping CSR dengan tindakan peserta. Kata Kunci : CSR, efektivitas komunikasi, unsur-unsur komunikasi.

Download Full-text

Assessing variability in surgical decision making among attending neurosurgeons at an academic center

Journal of Neurosurgery ◽

10.3171/2019.2.jns182658 ◽

2020 ◽

Vol 132 (6) ◽

pp. 1970-1976

Author(s):

Ashwin G. Ramayya ◽

H. Isaac Chen ◽

Paul J. Marcotte ◽

Steven Brem ◽

Eric L. Zager ◽

...

Keyword(s):

Spine Surgery ◽

Repeated Measures ◽

Elective Surgery ◽

Statistical Tests ◽

Neurosurgical Procedures ◽

P Values ◽

Surgical Interventions ◽

Radiology Reports ◽

Operative Notes ◽

First Time

OBJECTIVEAlthough it is known that intersurgeon variability in offering elective surgery can have major consequences for patient morbidity and healthcare spending, data addressing variability within neurosurgery are scarce. The authors performed a prospective peer review study of randomly selected neurosurgery cases in order to assess the extent of consensus regarding the decision to offer elective surgery among attending neurosurgeons across one large academic institution.METHODSAll consecutive patients who had undergone standard inpatient surgical interventions of 1 of 4 types (craniotomy for tumor [CFT], nonacute redo CFT, first-time spine surgery with/without instrumentation, and nonacute redo spine surgery with/without instrumentation) during the period 2015–2017 were retrospectively enrolled (n = 9156 patient surgeries, n = 80 randomly selected individual cases, n = 20 index cases of each type randomly selected for review). The selected cases were scored by attending neurosurgeons using a need for surgery (NFS) score based on clinical data (patient demographics, preoperative notes, radiology reports, and operative notes; n = 616 independent case reviews). Attending neurosurgeon reviewers were blinded as to performing provider and surgical outcome. Aggregate NFS scores across various categories were measured. The authors employed a repeated-measures mixed ANOVA model with autoregressive variance structure to compute omnibus statistical tests across the various surgery types. Interrater reliability (IRR) was measured using Cohen’s kappa based on binary NFS scores.RESULTSOverall, the authors found that most of the neurosurgical procedures studied were rated as “indicated” by blinded attending neurosurgeons (mean NFS = 88.3, all p values < 0.001) with greater agreement among neurosurgeon raters than expected by chance (IRR = 81.78%, p = 0.016). Redo surgery had lower NFS scores and IRR scores than first-time surgery, both for craniotomy and spine surgery (ANOVA, all p values < 0.01). Spine surgeries with fusion had lower NFS scores than spine surgeries without fusion procedures (p < 0.01).CONCLUSIONSThere was general agreement among neurosurgeons in terms of indication for surgery; however, revision surgery of all types and spine surgery with fusion procedures had the lowest amount of decision consensus. These results should guide efforts aimed at reducing unnecessary variability in surgical practice with the goal of effective allocation of healthcare resources to advance the value paradigm in neurosurgery.

Download Full-text

Faculty Opinions recommendation of Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726383303.793525667 ◽

2016 ◽

Author(s):

Chris Robertson

Keyword(s):

Confidence Intervals ◽

Statistical Tests ◽

P Values

Download Full-text