scholarly journals Fragility Index, power, strength and robustness of findings in sports medicine and arthroscopic surgery: a secondary analysis of data from a study on use of the Fragility Index in sports surgery

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6813 ◽  
Author(s):  
Aleksi Reito ◽  
Lauri Raittio ◽  
Olli Helminen

Background A recent study concluded that most findings reported as significant in sports medicine and arthroscopic surgery are not “robust” when evaluated with the Fragility Index (FI). A secondary analysis of data from a previous study was performed to investigate (1) the correctness of the findings, (2) the association between FI, p-value and post hoc power, (3) median power to detect a medium effect size, and (4) the implementation of sample size analysis in these randomized controlled trials (RCTs). Methods In addition to the 48 studies listed in the appendix accompanying the original study by Khan et al. (2017) we did a follow-up literature search and 18 additional studies were found. In total 66 studies were included in the analysis. We calculated post hoc power, p-values and confidence intervals associated with the main outcome variable. Use of a priori power analysis was recorded. The median power to detect small (h > 0.2), medium (h > 0.5), or large effect (h > 0.8) with a baseline proportion of events of 10% and 30% in each study included was calculated. Three simulation data sets were used to validate our findings. Results Inconsistencies were found in eight studies. A priori power analysis was missing in one-fourth of studies (16/66). The median power to detect a medium effect size with a baseline proportion of events of 10% and 30% was 42% and 43%, respectively. The FI was inherently associated with the achieved p-value and post hoc power. Discussion A relatively high proportion of studies had inconsistencies. The FI is a surrogate measure for p-value and post hoc power. Based on these studies, the median power in this field of research is suboptimal. There is an urgent need to investigate how well research claims in orthopedics hold in a replicated setting and the validity of research findings.

2018 ◽  
Vol 6 (8) ◽  
pp. 232596711879151 ◽  
Author(s):  
Brandon J. Erickson ◽  
Peter N. Chalmers ◽  
Jon Newgren ◽  
Marissa Malaret ◽  
Michael O’Brien ◽  
...  

Background: The Kerlan-Jobe Orthopaedic Clinic (KJOC) shoulder and elbow outcome score is a functional assessment tool for the upper extremity of the overhead athlete, which is currently validated for administration in person. Purpose/Hypothesis: The purpose of this study was to validate the KJOC score for administration over the phone. The hypothesis was that no difference will exist in KJOC scores for the same patient between administration in person versus over the phone. Study Design: Cohort study (diagnosis); Level of evidence, 2. Methods: Fifty patients were randomized to fill out the KJOC questionnaire either over the phone first (25 patients) or in person first (25 patients) based on an a priori power analysis. One week after the patients completed the initial KJOC on the phone or in person, they then filled out the score via the opposite method. Results were compared per question and for overall score. Results: There was a mean ± SD of 8 ± 5 days between when patients completed the first and second questionnaires. There were no significant differences in the overall KJOC score between the phone and paper groups ( P = .139). The intraclass correlation coefficient comparing paper and phone scores was 0.802 (95% CI, 0.767-0.883; P < .001), with a Cronbach alpha of 0.89. On comparison of individual questions, there were significant differences for questions 1, 3, and 8 ( P = .013, .023, and .042, respectively). Conclusion: The KJOC questionnaire can be administered over the phone with no significant difference in overall score as compared with that from in-person administration.


2016 ◽  
Vol 45 (9) ◽  
pp. 2164-2170 ◽  
Author(s):  
Moin Khan ◽  
Nathan Evaniew ◽  
Mark Gichuru ◽  
Anthony Habib ◽  
Olufemi R. Ayeni ◽  
...  

Background: High-quality, evidence-based orthopaedic care relies on the generation and translation of robust research evidence. The Fragility Index is a novel method for evaluating the robustness of statistically significant findings from randomized controlled trials (RCTs). It is defined as the minimum number of patients in 1 arm of a trial that would have to change status from a nonevent to an event to alter the results of the trial from statistically significant to nonsignificant. Purpose: To calculate the Fragility Index of statistically significant results from clinical trials in sports medicine and arthroscopic surgery to characterize the robustness of the RCTs in these fields. Methods: A search was conducted in Medline, EMBASE, and PubMed for RCTs related to sports medicine and arthroscopic surgery from January 1, 2005, to October 30, 2015. Two reviewers independently assessed titles and abstracts for study eligibility, performed data extraction, and assessed risk of bias. The Fragility Index was calculated using the Fisher exact test for all statistically significant dichotomous outcomes from parallel-group RCTs. Bivariate correlation was performed to evaluate associations between the Fragility Index and trial characteristics. Results: A total of 48 RCTs were included. The median sample size was 64 (interquartile range [IQR], 48.5-89.5), and the median total number of outcome events was 19 (IQR, 10-27). The median Fragility Index was 2 (IQR, 1-2.8), meaning that changing 2 patients from a nonevent to an event in the treatment arm changed the result to a statistically nonsignificant result, or P ≥ .05. Conclusion: Most statistically significant RCTs in sports medicine and arthroscopic surgery are not robust because their statistical significance can be reversed by changing the outcome status on only a few patients in 1 treatment group. Future work is required to determine whether routine reporting of the Fragility Index enhances clinicians’ ability to detect trial results that should be viewed cautiously.


2014 ◽  
Vol 67 (9) ◽  
pp. 781-786 ◽  
Author(s):  
Allison Osmond ◽  
Hector Li-Chang ◽  
Richard Kirsch ◽  
Dimitrios Divaris ◽  
Vincent Falck ◽  
...  

AimsFollowing the introduction of colorectal cancer screening programmes throughout Canada, it became necessary to standardise the diagnosis of colorectal adenomas. Canadian guidelines for standardised reporting of adenomas were developed in 2011. The aims of the present study were (a) to assess interobserver variability in the classification of dysplasia and architecture in adenomas and (b) to determine if interobserver variability could be improved by the adoption of criteria specified in the national guidelines.MethodsAn a priori power analysis was used to determine an adequate number of cases and participants. Twelve pathologists independently classified 40 whole-slide images of adenomas according to architecture and dysplasia grade. Following a wash-out period, participants were provided with the national guidelines and asked to reclassify the study set.ResultsAt baseline, there was moderate interobserver agreement for architecture (K=0.4700; 95% CI 0.4427 to 0.4972) and dysplasia grade (K=0.5680; 95% CI 0.5299 to 0.6062). Following distribution of the guidelines, there was improved interobserver agreement in assessing architecture (K=0.5403; 95% CI 0.5133 to 0.5674)). For dysplasia grade, overall interobserver agreement remained moderate but decreased significantly (K=0.4833; 95% CI 0.4452 to 0.5215). Half of the cases contained high-grade dysplasia (HGD). Two pathologists diagnosed HGD in ≥75% of cases.ConclusionsThe improvement in interobserver agreement in classifying adenoma architecture suggests that national guidelines can be useful in disseminating knowledge, however, the variability in the diagnosis of HGD, even following guideline review suggests the need for ongoing knowledge-transfer exercises.


2021 ◽  
Author(s):  
Daniel Lakens

An important step when designing a study is to justify the sample size that will be collected. The key aim of a sample size justification is to explain how the collected data is expected to provide valuable information given the inferential goals of the researcher. In this overview article six approaches are discussed to justify the sample size in a quantitative empirical study: 1) collecting data from (an)almost) the entire population, 2) choosing a sample size based on resource constraints, 3) performing an a-priori power analysis, 4) planning for a desired accuracy, 5) using heuristics, or 6) explicitly acknowledging the absence of a justification. An important question to consider when justifying sample sizes is which effect sizes are deemed interesting, and the extent to which the data that is collected informs inferences about these effect sizes. Depending on the sample size justification chosen, researchers could consider 1) what the smallest effect size of interest is, 2) which minimal effect size will be statistically significant, 3) which effect sizes they expect (and what they base these expectations on), 4) which effect sizes would be rejected based on a confidence interval around the effect size, 5) which ranges of effects a study has sufficient power to detect based on a sensitivity power analysis, and 6) which effect sizes are plausible in a specific research area. Researchers can use the guidelines presented in this article to improve their sample size justification, and hopefully, align the informational value of a study with their inferential goals.


Sign in / Sign up

Export Citation Format

Share Document