scholarly journals Measures of sensitivity based on a single hit rate and false alarm rate: The accuracy, precision, and robustness of′,A z, andA’

2006 ◽  
Vol 68 (4) ◽  
pp. 643-654 ◽  
Author(s):  
Michael F. Verde ◽  
Neil A. Macmillan ◽  
Caren M. Rotello
2021 ◽  
Author(s):  
Simon Dadson ◽  
Eleanor Blyth ◽  
Douglas Clark ◽  
Helen Davies ◽  
Richard Ellis ◽  
...  

<p>Timely predictions of fluvial flooding are important for national and regional planning and real-time flood response. Several new computational techniques have emerged in the past decade for making rapid fluvial flood inundation predictions at time and space scales relevant to early warning, although their efficient use is often constrained by the trade-off between model complexity, topographic fidelity and scale. Here we apply a simplified approach to large-area fluvial flood inundation modelling which combines a solution to the inertial form of the shallow water equations at 1 km horizontal resolution, with two alternative, simplified representations of sub-grid floodplain topography. One of these uses a fitted sub-grid probability distribution, the other a quantile-based representation of the floodplain. We evaluate the model’s steady-state performance when used with flood depth estimates corresponding to the 0.01 Annual Exceedance Probability (AEP; ‘100-year’) flood and compare the results with published benchmark data for England. The quantile-based method accurately predicts flood inundation in 86% of locations, with a domain-wide hit rate of 95% and false alarm rate of 10%. These performance measures compare with a hit rate of 71%, and false alarm rate of 9% for the simpler, distribution-based method. We suggest that these approaches are suitable for rapid, wide-area flood forecasting and climate change impact assessment.</p>


2021 ◽  
Author(s):  
Timothy F. Brady ◽  
Maria Martinovna Robinson ◽  
Jamal Rodgers Williams ◽  
John Wixted

There is a crisis of measurement in memory research, with major implications for theory and practice. This crisis arises because of a critical complication present when measuring memory using the recognition memory task that dominates the study of working memory and long-term memory (“did you see this item? yes/no” or “did this item change? yes/no”). Such tasks give two measures of performance, the “hit rate” (how often you say you previously saw an item you actually did previously see) and the “false alarm rate” (how often you say you saw something you never saw). Yet what researchers want is one single, integrated measure of memory performance. Integrating the hit and false alarm rate into a single measure, however, requires a complex problem of counterfactual reasoning that depends on the (unknowable) distribution of underlying memory signals: when faced with two people differing in both hit rate and false alarm rate, the question of who had the better memory is really “who would have had more hits if they each had the same number of false alarms”. As a result of this difficulty, different literatures in memory research (e.g., visual working memory, eyewitness identification, picture memory, etc) have settled on a variety of distinct metrics to combine hit rates and false alarm rates (e.g., A’, corrected hit rate, percent correct, d’, diagnosticity ratios, K values, etc.). These metrics make different, contradictory assumptions about the distribution of latent memory signals, and all of their assumptions are frequently incorrect. Despite a large literature on how to properly measure memory performance, spanning decades, real-life decisions are often made using these metrics, even when they subsequently turn out to be wrong when memory is studied with better measures. We suggest that in order for the psychology and neuroscience of memory to become a cumulative, theory-driven science, more attention must be given to measurement issues. We make a concrete suggestion: the default memory task should change from old/new (“did you see this item’?”) to forced-choice (“which of these two items did you see?”). In situations where old/new variants are preferred (e.g., eyewitness identification; theoretical investigations of the nature of memory decisions), receiver operating characteristic (ROC) analysis should always be performed.


Author(s):  
Kuldeep Singh ◽  
Palvi Aggarwal ◽  
Prashanth Rajivan ◽  
Cleotilde Gonzalez

We studied people’s success on the detection of phishing emails after they were trained under one of three phishing frequency conditions, where the proportion of the phishing emails during training varied as: low frequency (25% phishing emails), medium frequency (50% phishing emails) and high frequency (75% phishing emails). Individual base susceptibility to phishing emails was measured in a pre-training phase in which 20% of the emails were phishing; this performance was then compared to a post-training phase in which participants aimed at detecting new rare phishing emails (20% were phishing emails). The Hit rates, False Alarm rates, sensitivities and response criterion were analyzed. Results revealed that participants receiving higher frequency of phishing emails had a higher hit rate but also higher false alarm rate at detecting phishing emails at post-training compared to participants encountering lower frequency levels during training. These results have implications for designing new training protocols for improving detection of phishing emails.


2021 ◽  
Author(s):  
Simon J. Dadson ◽  
Eleanor Blyth ◽  
Douglas Clark ◽  
Helen Davies ◽  
Richard Ellis ◽  
...  

Abstract. Timely predictions of fluvial flooding are important for national and regional planning and real-time flood response. Several new computational techniques have emerged in the past decade for making rapid fluvial flood inundation predictions at time and space scales relevant to early warning, although their efficient use is often constrained by the trade-off between model complexity, topographic fidelity and scale. Here we apply a simplified approach to large-area fluvial flood inundation modelling which combines a 5 solution to the inertial form of the shallow water equations at 1 km horizontal resolution, with two alternative representations of sub-grid floodplain topography. One of these uses a fitted sub-grid probability distribution, the other a quantile-based representation of the floodplain. We evaluate the model's performance when used to simulate the 0.01 Annual Exceedance Probability (AEP; 100-year) flood and compare the results with published benchmark data for England. The quantile-based method accurately predicts flood inundation in 86 % of locations, with a domain-wide hit rate of 95 % and 10 false alarm rate of 10 %. These performance measures compare with a hit rate of 71 %, and false alarm rate of 9 % for the simpler, but faster, distribution-based method. We suggest that these approaches are suitable for rapid, wide-area flood forecasting and climate change impact assessment.


Author(s):  
Jean MacMillan ◽  
Eileen B. Entin ◽  
Daniel Serfaty

In machine-aided target recognition, human operators work with an automatic target recognition (ATR) system to locate targets in cluttered and degraded imagery. The operator must integrate his or her own visual judgment concerning whether a target is present in the image with the ATR's judgment, which is typically expressed numerically. We conducted a series of experiments in which subjects attempted to locate target shapes among non-targets based only on visual images and based on both visual images and supplementary numeric information such as an ATR might provide. Image quality was controlled as an independent variable through the use of distortion rates that randomly altered pixel values to degrade the image. We found that subjects maintained a constant false alarm rate as image distortion increased, at the expense of a lower hit rate. This result was found consistently in experiments where the subjects' task was to distinguish single targets from a blank background, to distinguish single targets from single non-targets, and to locate multiple targets in a multiple-object display. We also found a bias toward over reliance on image versus numeric information. As image distortion increased, subjects failed to make optimal use of supplementary numeric information and showed an unnecessary decrease in performance. The results suggest that operators may experience difficulty in working with an ATR that has a high false alarm rate, even if the ATR's hit rate is also high, and that numeric expressions of ATR judgment may be undervalued by operators in locating targets.


1982 ◽  
Vol 54 (3) ◽  
pp. 836-838 ◽  
Author(s):  
Lee M. Markowitz

Grier's 1971 computing formulas for sensitivity and bias given only one datum are extended to the situation in which hit rate is less than false alarm rate.


2015 ◽  
Vol 28 (2) ◽  
pp. 92-100 ◽  
Author(s):  
Zhenhe Zhou ◽  
Hongliang Zhou ◽  
Hongmei Zhu

ObjectiveThe purpose of the present study was to test whether individuals with Internet addiction disorder (IAD) presented analogous characteristics of working memory, executive function and impulsivity compared with pathological gambling (PG) patients.MethodsThe subjects included 23 individuals with IAD, 23 PG patients and 23 controls. All of the participants were measured with the digit span task, Wisconsin Card Sorting Test, go/no-go task and Barratt Impulsiveness Scale-11 (BIS-11) under the same experimental conditions.ResultsThe results of this study showed that the false alarm rate, total response errors, perseverative errors, failure to maintain set and BIS-11 scores of both the IAD and PG groups were significantly higher than that of the control group. In addition, the forward scores and backwards scores, percentage of conceptual level responses, number of categories completed and hit rate of the IAD and PG groups were significantly lower than that of the control group. Furthermore, the false alarm rate and BIS-11 scores of the IAD group were significantly higher than those of PG patients, and the hit rate was significantly lower than that of the PG patients.ConclusionsIndividuals with IAD and PG patients present deficiencies in working memory, executive dysfunction and impulsivity, and individuals with IAD are more impulsive than PG patients.


2019 ◽  
Author(s):  
Holly J Bowen ◽  
Michelle Marchesi ◽  
Elizabeth Kensinger

Reward-motivated memory has been studied extensively in psychology and neuroscience. Most studies follow the same type of paradigm: stimuli are cued at encoding with high or low reward values which indicate the amount the stimulus is worth if successfully remembered on a subsequent memory test, usually recognition. Each incorrect endorsement of a lure at retrieval is penalized with an arbitrary value between the high and low reward value, resulting in a single false alarm rate. Studies employing this type of paradigm have reported higher hit rates for high value items compared to low value items, but generally hit rate is the only measure of memory that is reported as a function of reward value. This leaves open the possibility that high reward items have a higher hit rate because participants are more willing to endorse those items as “old”, due to biases in the reward structure of these paradigms. Other measures, like discriminability and response bias, are overlooked when there is only a single false alarm rate, but we hypothesize that these other measures are also susceptible to motivational manipulations. To test whether reward motivation influences these other factors, we created a novel paradigm that associated rewards with categories (indoor vs. outdoor scenes), allowing for a separate false alarm rate as well as hit rate at each level of reward. We report results of three experiments that varied when reward cues were introduced and the penalties associated with false alarms for the categorized items. We replicated prior findings of higher hit rates for high compared to low reward items, but consistently across three experiments, when d’ was calculated, we found no difference in memory discriminability as a function of reward. Further, in two experiments we found that response bias was more conservative for low reward items: participants were more likely to endorse a “new” response to low compared to high reward items. This latter effect of reward on response bias did not occur when the amount of the false alarm penalty matched the possible reward. Our findings reveal that reward motivation influences memory, but also decisional biases thought to be independent of memory processes. The amount of both the reward value of the target items, and the amount of the false alarm penalty should be considered when designing experimental paradigms to study motivation-cognition interactions.


2013 ◽  
Vol 24 (10) ◽  
pp. 897-908
Author(s):  
Robert G. Turner

Background: A test protocol is created when individual tests are combined. Protocol performance can be calculated prior to clinical use; however, the necessary information is seldom available. Thus, protocols are frequently used with limited information as to performance. The next best strategy is to base protocol design on available information combined with a thorough understanding of the factors that determine protocol performance. Unfortunately, there is limited information as to these factors and how they interact. Purpose: The objective of this article and the next article in this issue is to examine in detail the three factors that determine protocol performance: (1) protocol criterion, (2) test correlation, (3) test performance. This article examines protocol criterion and test correlation. The next article examines the impact of individual test performance and summarizes the results of this series. The ultimate goal is to provide guidance on the formulation of a protocol using available information and an understanding of the impact of these three factors on performance. Research Design: A mathematical model is used to calculate protocol performance for different protocol criteria and test correlations while assuming that all individual tests in the protocol have the same performance. The advantages and disadvantages of the different criteria are evaluated for different test correlations. Results: A loose criterion will produce the highest protocol hit and false alarm rates; however, the false alarm rate may be unacceptably high. A strict criterion will produce the smallest protocol hit and false alarm rates; however, the hit rate may be unacceptably low. Adding tests to a protocol increases the probability that the protocol false alarm rate will be too high with a loose criterion and that the protocol hit rate will be too low with a strict criterion. The intermediate criterion, about which little has been known, provides advantages not available with the other two criteria. This criterion is much more likely to produce acceptable protocol hit and false alarm rates. It also has the potential to simultaneously produce a protocol hit rate higher, and a false alarm rate lower, than the individual tests. The intermediate criteria produce better protocol performance than the loose and strict criteria for protocols with the same number of tests. For all criteria, best protocol performance is obtained when the tests are uncorrelated and decreases as test correlation increases. When there is some test correlation, adding tests to the protocol can decrease protocol performance for a loose or strict criterion. The ability of a protocol to manipulate hit and false alarm rates, or improve performance relative to that of the individual tests, is reduced with increasing test correlation. Conclusions: The three criteria, loose, strict, and intermediate, have definite advantages and disadvantages over a large range of test correlations. Some of the advantages and disadvantages of the loose and strict criteria are impacted by test correlation. The advantages of the intermediate criteria are relatively independent of test correlation. When three or more tests are used in a protocol, consideration should be given to using an intermediate criterion, particularly if there is some test correlation. Greater test correlation diminishes the advantages of adding tests to a protocol, particularly with a loose or strict criterion. At higher test correlations, fewer tests in the protocol may be appropriate.


Sign in / Sign up

Export Citation Format

Share Document