scholarly journals Evaluation of Second-Level Inference in fMRI Analysis

2016 ◽  
Vol 2016 ◽  
pp. 1-22 ◽  
Author(s):  
Sanne P. Roels ◽  
Tom Loeys ◽  
Beatrijs Moerkerke

We investigate the impact of decisions in the second-level (i.e., over subjects) inferential process in functional magnetic resonance imaging on (1) the balance between false positives and false negatives and on (2) the data-analytical stability, both proxies for the reproducibility of results. Second-level analysis based on a mass univariate approach typically consists of 3 phases. First, one proceeds via a general linear model for a test image that consists of pooled information from different subjects. We evaluate models that take into account first-level (within-subjects) variability and models that do not take into account this variability. Second, one proceeds via inference based on parametrical assumptions or via permutation-based inference. Third, we evaluate 3 commonly used procedures to address the multiple testing problem: familywise error rate correction, False Discovery Rate (FDR) correction, and a two-step procedure with minimal cluster size. Based on a simulation study and real data we find that the two-step procedure with minimal cluster size results in most stable results, followed by the familywise error rate correction. The FDR results in most variable results, for both permutation-based inference and parametrical inference. Modeling the subject-specific variability yields a better balance between false positives and false negatives when using parametric inference.

Author(s):  
Damian Clarke ◽  
Joseph P. Romano ◽  
Michael Wolf

When considering multiple-hypothesis tests simultaneously, standard statistical techniques will lead to overrejection of null hypotheses unless the multiplicity of the testing framework is explicitly considered. In this article, we discuss the Romano–Wolf multiple-hypothesis correction and document its implementation in Stata. The Romano–Wolf correction (asymptotically) controls the familywise error rate, that is, the probability of rejecting at least one true null hypothesis among a family of hypotheses under test. This correction is considerably more powerful than earlier multiple-testing procedures, such as the Bonferroni and Holm corrections, given that it takes into account the dependence structure of the test statistics by resampling from the original data. We describe a command, rwolf, that implements this correction and provide several examples based on a wide range of models. We document and discuss the performance gains from using rwolf over other multiple-testing procedures that control the familywise error rate.


2015 ◽  
Vol 14 (1) ◽  
pp. 1-19 ◽  
Author(s):  
Rosa J. Meijer ◽  
Thijmen J.P. Krebs ◽  
Jelle J. Goeman

AbstractWe present a multiple testing method for hypotheses that are ordered in space or time. Given such hypotheses, the elementary hypotheses as well as regions of consecutive hypotheses are of interest. These region hypotheses not only have intrinsic meaning but testing them also has the advantage that (potentially small) signals across a region are combined in one test. Because the expected number and length of potentially interesting regions are usually not available beforehand, we propose a method that tests all possible region hypotheses as well as all individual hypotheses in a single multiple testing procedure that controls the familywise error rate. We start at testing the global null-hypothesis and when this hypothesis can be rejected we continue with further specifying the exact location/locations of the effect present. The method is implemented in the


Author(s):  
Gerald J. Kost

ABSTRACT Context. Coronavirus disease 2019 (COVID-19) test performance depends on predictive values in settings of increasing disease prevalence. Geospatially distributed diagnostics with minimal uncertainty facilitate efficient point-of-need strategies. Objectives. To use original mathematics to interpret COVID-19 test metrics; assess Food and Drug Administration Emergency Use Authorizations and Health Canada targets; compare predictive values for multiplex, antigen, polymerase chain reaction kit, point-of-care antibody, and home tests; enhance test performance; and improve decision-making. Design. PubMed/newsprint generated articles documenting prevalence. Mathematica and open access software helped perform recursive calculations, graph multivariate relationships, and visualize performance by comparing predictive value geometric mean-squared patterns. Results. Tiered sensitivity/specificity comprise: T1) 90%, 95%; T2) 95%, 97.5%; and T3) 100%, ≥99%. Tier 1 false negatives exceed true negatives at >90.5% prevalence; false positives exceeded true positives at <5.3% prevalence. High sensitivity/specificity tests reduce false negatives and false positives yielding superior predictive values. Recursive testing improves predictive values. Visual logistics facilitate test comparisons. Antigen test quality falls off as prevalence increases. Multiplex severe acute respiratory syndrome (SARS)-CoV-2)*Influenza A/B*Respiratory-Syncytial Virus (RSV) testing performs reasonably well compared to Tier 3. Tier 3 performance with a Tier 2 confidence band lower limit will generate excellent performance and reliability. Conclusions. The overriding principle is select the best combined performance and reliability pattern for the prevalence bracket. Some public health professionals recommend repetitive testing to compensate for low sensitivity. More logically, improved COVID-19 assays with less uncertainty conserve resources. Multiplex differentiation of COVID-19 from Influenza A/B-RSV represents an effective strategy if seasonal flu surges next year.


Biometrika ◽  
2020 ◽  
Vol 107 (3) ◽  
pp. 761-768 ◽  
Author(s):  
E Dobriban

Summary Multiple hypothesis testing problems arise naturally in science. This note introduces a new fast closed testing method for multiple testing which controls the familywise error rate. Controlling the familywise error rate is state-of-the-art in many important application areas and is preferred over false discovery rate control for many reasons, including that it leads to stronger reproducibility. The closure principle rejects an individual hypothesis if all global nulls of subsets containing it are rejected using some test statistics. It takes exponential time in the worst case. When the tests are symmetric and monotone, the proposed method is an exact algorithm for computing the closure, is quadratic in the number of tests, and is linear in the number of discoveries. Our framework generalizes most examples of closed testing, such as Holm’s method and the Bonferroni method. As a special case of the method, we propose the Simes and higher criticism fusion test, which is powerful both for detecting a few strong signals and for detecting many moderate signals.


2020 ◽  
Author(s):  
Cathie Sudlow ◽  
Peter Diggle ◽  
Oliver Warlow ◽  
David Seymour ◽  
Ben Gordon ◽  
...  

Background: Calls are increasing for widespread SARS-CoV-2 infection testing of people from populations with a very low prevalence of infection. We quantified the impact of less than perfect diagnostic test accuracy on populations, and on individuals, in low prevalence settings, focusing on false positives and the role of confirmatory testing. Methods: We developed a simple, interactive tool to assess the impact of different combinations of test sensitivity, specificity and infection prevalence in a notional population of 100,000. We derived numbers of true positives, true negatives, false positives and false negatives, positive predictive value (PPV, the percentage of test positives that are true positives) and overall test accuracy for three testing strategies: (1) single test for all; (2) add repeat testing in test positives; (3) add further repeat testing in those with discrepant results. We also assessed the impact on test results for individuals having one, two or three tests under these three strategies. Results: With sensitivity of 80%, infection prevalence of 1 in 2,000, and specificity 99.9% on all tests, PPV in the tested population of 100,000 will be only 29% with one test, increasing to >99.5% (100% when rounded to the nearest %) with repeat testing in strategies 2 or 3. More realistically, if specificity is 95% for the first and 99.9% for subsequent tests, single test PPV will be only 1%, increasing to 86% with repeat testing in strategy 2, or 79% with strategy 3 (albeit with 6 fewer false negatives than strategy 2). In the whole population, or in particular individuals, PPV increases as infection becomes more common in the population but falls to unacceptably low levels with lower test specificity. Conclusion: To avoid multiple unnecessary restrictions on whole populations, and in particular individuals, from widespread population testing for SARS-CoV-2, the crucial roles of extremely high test specificity and of confirmatory testing must be fully appreciated and incorporated into policy decisions.


2004 ◽  
Vol 3 (1) ◽  
pp. 1-25 ◽  
Author(s):  
Mark J. van der Laan ◽  
Sandrine Dudoit ◽  
Katherine S. Pollard

This article shows that any single-step or stepwise multiple testing procedure (asymptotically) controlling the family-wise error rate (FWER) can be augmented into procedures that (asymptotically) control tail probabilities for the number of false positives and the proportion of false positives among the rejected hypotheses. Specifically, given any procedure that (asymptotically) controls the FWER at level alpha, we propose simple augmentation procedures that provide (asymptotic) level-alpha control of: (i) the generalized family-wise error rate, i.e., the tail probability, gFWER(k), that the number of Type I errors exceeds a user-supplied integer k, and (ii) the tail probability, TPPFP(q), that the proportion of Type I errors among the rejected hypotheses exceeds a user-supplied value 0


FACETS ◽  
2018 ◽  
Vol 3 (1) ◽  
pp. 563-583 ◽  
Author(s):  
Michael Evans ◽  
Jabed Tomal

The measurement of statistical evidence is of considerable current interest in fields where statistical criteria are used to determine knowledge. The most commonly used approach to measuring such evidence is through the use of p-values, even though these are known to possess a number of properties that lead to doubts concerning their validity as measures of evidence. It is less well known that there are alternatives with the desired properties of a measure of statistical evidence. The measure of evidence given by the relative belief ratio is employed in this paper. A relative belief multiple testing algorithm was developed to control for false positives and false negatives through bounds on the evidence determined by measures of bias. The relative belief multiple testing algorithm was shown to be consistent and to possess an optimal property when considering the testing of a hypothesis randomly chosen from the collection of considered hypotheses. The relative belief multiple testing algorithm was applied to the problem of inducing sparsity. Priors were chosen via elicitation, and sparsity was induced only when justified by the evidence and there was no dependence on any particular form of a prior for this purpose.


2019 ◽  
Vol 16 (2) ◽  
pp. 132-141 ◽  
Author(s):  
Alexandra Blenkinsop ◽  
Mahesh KB Parmar ◽  
Babak Choodari-Oskooei

Background The multi-arm multi-stage framework uses intermediate outcomes to assess lack-of-benefit of research arms at interim stages in randomised trials with time-to-event outcomes. However, the design lacks formal methods to evaluate early evidence of overwhelming efficacy on the definitive outcome measure. We explore the operating characteristics of this extension to the multi-arm multi-stage design and how to control the pairwise and familywise type I error rate. Using real examples and the updated nstage program, we demonstrate how such a design can be developed in practice. Methods We used the Dunnett approach for assessing treatment arms when conducting comprehensive simulation studies to evaluate the familywise error rate, with and without interim efficacy looks on the definitive outcome measure, at the same time as the planned lack-of-benefit interim analyses on the intermediate outcome measure. We studied the effect of the timing of interim analyses, allocation ratio, lack-of-benefit boundaries, efficacy rule, number of stages and research arms on the operating characteristics of the design when efficacy stopping boundaries are incorporated. Methods for controlling the familywise error rate with efficacy looks were also addressed. Results Incorporating Haybittle–Peto stopping boundaries on the definitive outcome at the interim analyses will not inflate the familywise error rate in a multi-arm design with two stages. However, this rule is conservative; in general, more liberal stopping boundaries can be used with minimal impact on the familywise error rate. Efficacy bounds in trials with three or more stages using an intermediate outcome may inflate the familywise error rate, but we show how to maintain strong control. Conclusion The multi-arm multi-stage design allows stopping for both lack-of-benefit on the intermediate outcome and efficacy on the definitive outcome at the interim stages. We provide guidelines on how to control the familywise error rate when efficacy boundaries are implemented in practice.


2020 ◽  
Vol 2020 (14) ◽  
pp. 378-1-378-7
Author(s):  
Tyler Nuanes ◽  
Matt Elsey ◽  
Radek Grzeszczuk ◽  
John Paul Shen

We present a high-quality sky segmentation model for depth refinement and investigate residual architecture performance to inform optimally shrinking the network. We describe a model that runs in near real-time on mobile device, present a new, highquality dataset, and detail a unique weighing to trade off false positives and false negatives in binary classifiers. We show how the optimizations improve bokeh rendering by correcting stereo depth misprediction in sky regions. We detail techniques used to preserve edges, reject false positives, and ensure generalization to the diversity of sky scenes. Finally, we present a compact model and compare performance of four popular residual architectures (ShuffleNet, MobileNetV2, Resnet-101, and Resnet-34-like) at constant computational cost.


Sign in / Sign up

Export Citation Format

Share Document