Understanding Protocol Performance: Impact of Criterion and Test Correlation

2013 ◽  
Vol 24 (10) ◽  
pp. 897-908
Author(s):  
Robert G. Turner

Background: A test protocol is created when individual tests are combined. Protocol performance can be calculated prior to clinical use; however, the necessary information is seldom available. Thus, protocols are frequently used with limited information as to performance. The next best strategy is to base protocol design on available information combined with a thorough understanding of the factors that determine protocol performance. Unfortunately, there is limited information as to these factors and how they interact. Purpose: The objective of this article and the next article in this issue is to examine in detail the three factors that determine protocol performance: (1) protocol criterion, (2) test correlation, (3) test performance. This article examines protocol criterion and test correlation. The next article examines the impact of individual test performance and summarizes the results of this series. The ultimate goal is to provide guidance on the formulation of a protocol using available information and an understanding of the impact of these three factors on performance. Research Design: A mathematical model is used to calculate protocol performance for different protocol criteria and test correlations while assuming that all individual tests in the protocol have the same performance. The advantages and disadvantages of the different criteria are evaluated for different test correlations. Results: A loose criterion will produce the highest protocol hit and false alarm rates; however, the false alarm rate may be unacceptably high. A strict criterion will produce the smallest protocol hit and false alarm rates; however, the hit rate may be unacceptably low. Adding tests to a protocol increases the probability that the protocol false alarm rate will be too high with a loose criterion and that the protocol hit rate will be too low with a strict criterion. The intermediate criterion, about which little has been known, provides advantages not available with the other two criteria. This criterion is much more likely to produce acceptable protocol hit and false alarm rates. It also has the potential to simultaneously produce a protocol hit rate higher, and a false alarm rate lower, than the individual tests. The intermediate criteria produce better protocol performance than the loose and strict criteria for protocols with the same number of tests. For all criteria, best protocol performance is obtained when the tests are uncorrelated and decreases as test correlation increases. When there is some test correlation, adding tests to the protocol can decrease protocol performance for a loose or strict criterion. The ability of a protocol to manipulate hit and false alarm rates, or improve performance relative to that of the individual tests, is reduced with increasing test correlation. Conclusions: The three criteria, loose, strict, and intermediate, have definite advantages and disadvantages over a large range of test correlations. Some of the advantages and disadvantages of the loose and strict criteria are impacted by test correlation. The advantages of the intermediate criteria are relatively independent of test correlation. When three or more tests are used in a protocol, consideration should be given to using an intermediate criterion, particularly if there is some test correlation. Greater test correlation diminishes the advantages of adding tests to a protocol, particularly with a loose or strict criterion. At higher test correlations, fewer tests in the protocol may be appropriate.

2013 ◽  
Vol 24 (10) ◽  
pp. 909-919
Author(s):  
Robert G. Turner

Background: This is the second of two articles that examine the factors that determine protocol performance. The objective of these articles is to provide a general understanding of protocol performance that can be used to estimate performance, establish limits on performance, decide if a protocol is justified, and ultimately select a protocol. The first article was concerned with protocol criterion and test correlation. It demonstrated the advantages and disadvantages of different criterion when all tests had the same performance. It also examined the impact of increasing test correlation on protocol performance and the characteristics of the different criteria. Purpose: To examine the impact on protocol performance when individual tests in a protocol have different performance. This is evaluated for different criteria and test correlations. The results of the two articles are combined and summarized. Research Design: A mathematical model is used to calculate protocol performance for different protocol criteria and test correlations when there are small to large variations in the performance of individual tests in the protocol. Results: The performance of the individual tests that make up a protocol has a significant impact on the performance of the protocol. As expected, the better the performance of the individual tests, the better the performance of the protocol. Many of the characteristics of the different criteria are relatively independent of the variation in the performance of the individual tests. However, increasing test variation degrades some criteria advantages and causes a new disadvantage to appear. This negative impact increases as test variation increases and as more tests are added to the protocol. Conclusions: Best protocol performance is obtained when individual tests are uncorrelated and have the same performance. In general, the greater the variation in the performance of tests in the protocol, the more detrimental this variation is to protocol performance. Since this negative impact is increased as more tests are added to the protocol, greater test variation indicates using fewer tests in the protocol.


2021 ◽  
Author(s):  
Simon Dadson ◽  
Eleanor Blyth ◽  
Douglas Clark ◽  
Helen Davies ◽  
Richard Ellis ◽  
...  

<p>Timely predictions of fluvial flooding are important for national and regional planning and real-time flood response. Several new computational techniques have emerged in the past decade for making rapid fluvial flood inundation predictions at time and space scales relevant to early warning, although their efficient use is often constrained by the trade-off between model complexity, topographic fidelity and scale. Here we apply a simplified approach to large-area fluvial flood inundation modelling which combines a solution to the inertial form of the shallow water equations at 1 km horizontal resolution, with two alternative, simplified representations of sub-grid floodplain topography. One of these uses a fitted sub-grid probability distribution, the other a quantile-based representation of the floodplain. We evaluate the model’s steady-state performance when used with flood depth estimates corresponding to the 0.01 Annual Exceedance Probability (AEP; ‘100-year’) flood and compare the results with published benchmark data for England. The quantile-based method accurately predicts flood inundation in 86% of locations, with a domain-wide hit rate of 95% and false alarm rate of 10%. These performance measures compare with a hit rate of 71%, and false alarm rate of 9% for the simpler, distribution-based method. We suggest that these approaches are suitable for rapid, wide-area flood forecasting and climate change impact assessment.</p>


2006 ◽  
Vol 68 (4) ◽  
pp. 643-654 ◽  
Author(s):  
Michael F. Verde ◽  
Neil A. Macmillan ◽  
Caren M. Rotello

2021 ◽  
Author(s):  
Timothy F. Brady ◽  
Maria Martinovna Robinson ◽  
Jamal Rodgers Williams ◽  
John Wixted

There is a crisis of measurement in memory research, with major implications for theory and practice. This crisis arises because of a critical complication present when measuring memory using the recognition memory task that dominates the study of working memory and long-term memory (“did you see this item? yes/no” or “did this item change? yes/no”). Such tasks give two measures of performance, the “hit rate” (how often you say you previously saw an item you actually did previously see) and the “false alarm rate” (how often you say you saw something you never saw). Yet what researchers want is one single, integrated measure of memory performance. Integrating the hit and false alarm rate into a single measure, however, requires a complex problem of counterfactual reasoning that depends on the (unknowable) distribution of underlying memory signals: when faced with two people differing in both hit rate and false alarm rate, the question of who had the better memory is really “who would have had more hits if they each had the same number of false alarms”. As a result of this difficulty, different literatures in memory research (e.g., visual working memory, eyewitness identification, picture memory, etc) have settled on a variety of distinct metrics to combine hit rates and false alarm rates (e.g., A’, corrected hit rate, percent correct, d’, diagnosticity ratios, K values, etc.). These metrics make different, contradictory assumptions about the distribution of latent memory signals, and all of their assumptions are frequently incorrect. Despite a large literature on how to properly measure memory performance, spanning decades, real-life decisions are often made using these metrics, even when they subsequently turn out to be wrong when memory is studied with better measures. We suggest that in order for the psychology and neuroscience of memory to become a cumulative, theory-driven science, more attention must be given to measurement issues. We make a concrete suggestion: the default memory task should change from old/new (“did you see this item’?”) to forced-choice (“which of these two items did you see?”). In situations where old/new variants are preferred (e.g., eyewitness identification; theoretical investigations of the nature of memory decisions), receiver operating characteristic (ROC) analysis should always be performed.


Author(s):  
Kuldeep Singh ◽  
Palvi Aggarwal ◽  
Prashanth Rajivan ◽  
Cleotilde Gonzalez

We studied people’s success on the detection of phishing emails after they were trained under one of three phishing frequency conditions, where the proportion of the phishing emails during training varied as: low frequency (25% phishing emails), medium frequency (50% phishing emails) and high frequency (75% phishing emails). Individual base susceptibility to phishing emails was measured in a pre-training phase in which 20% of the emails were phishing; this performance was then compared to a post-training phase in which participants aimed at detecting new rare phishing emails (20% were phishing emails). The Hit rates, False Alarm rates, sensitivities and response criterion were analyzed. Results revealed that participants receiving higher frequency of phishing emails had a higher hit rate but also higher false alarm rate at detecting phishing emails at post-training compared to participants encountering lower frequency levels during training. These results have implications for designing new training protocols for improving detection of phishing emails.


2012 ◽  
Vol 19 (4) ◽  
pp. 753-761 ◽  
Author(s):  
Yanlong Cao ◽  
Yuanfeng He ◽  
Huawen Zheng ◽  
Jiangxin Yang

In order to reduce the false alarm rate and missed detection rate of a Loose Parts Monitoring System (LPMS) for Nuclear Power Plants, a new hybrid method combining Linear Predictive Coding (LPC) and Support Vector Machine (SVM) together to discriminate the loose part signal is proposed. The alarm process is divided into two stages. The first stage is to detect the weak burst signal for reducing the missed detection rate. Signal is whitened to improve the SNR, and then the weak burst signal can be detected by checking the short-term Root Mean Square (RMS) of the whitened signal. The second stage is to identify the detected burst signal for reducing the false alarm rate. Taking the signal's LPC coefficients as its characteristics, SVM is then utilized to determine whether the signal is generated by the impact of a loose part. The experiment shows that whitening the signal in the first stage can detect a loose part burst signal even at very low SNR and thusly can significantly reduce the rate of missed detection. In the second alarm stage, the loose parts' burst signal can be distinguished from pulse disturbance by using SVM. Even when the SNR is −15 dB, the system can still achieve a 100% recognition rate


2021 ◽  
Author(s):  
Simon J. Dadson ◽  
Eleanor Blyth ◽  
Douglas Clark ◽  
Helen Davies ◽  
Richard Ellis ◽  
...  

Abstract. Timely predictions of fluvial flooding are important for national and regional planning and real-time flood response. Several new computational techniques have emerged in the past decade for making rapid fluvial flood inundation predictions at time and space scales relevant to early warning, although their efficient use is often constrained by the trade-off between model complexity, topographic fidelity and scale. Here we apply a simplified approach to large-area fluvial flood inundation modelling which combines a 5 solution to the inertial form of the shallow water equations at 1 km horizontal resolution, with two alternative representations of sub-grid floodplain topography. One of these uses a fitted sub-grid probability distribution, the other a quantile-based representation of the floodplain. We evaluate the model's performance when used to simulate the 0.01 Annual Exceedance Probability (AEP; 100-year) flood and compare the results with published benchmark data for England. The quantile-based method accurately predicts flood inundation in 86 % of locations, with a domain-wide hit rate of 95 % and 10 false alarm rate of 10 %. These performance measures compare with a hit rate of 71 %, and false alarm rate of 9 % for the simpler, but faster, distribution-based method. We suggest that these approaches are suitable for rapid, wide-area flood forecasting and climate change impact assessment.


Author(s):  
Jean MacMillan ◽  
Eileen B. Entin ◽  
Daniel Serfaty

In machine-aided target recognition, human operators work with an automatic target recognition (ATR) system to locate targets in cluttered and degraded imagery. The operator must integrate his or her own visual judgment concerning whether a target is present in the image with the ATR's judgment, which is typically expressed numerically. We conducted a series of experiments in which subjects attempted to locate target shapes among non-targets based only on visual images and based on both visual images and supplementary numeric information such as an ATR might provide. Image quality was controlled as an independent variable through the use of distortion rates that randomly altered pixel values to degrade the image. We found that subjects maintained a constant false alarm rate as image distortion increased, at the expense of a lower hit rate. This result was found consistently in experiments where the subjects' task was to distinguish single targets from a blank background, to distinguish single targets from single non-targets, and to locate multiple targets in a multiple-object display. We also found a bias toward over reliance on image versus numeric information. As image distortion increased, subjects failed to make optimal use of supplementary numeric information and showed an unnecessary decrease in performance. The results suggest that operators may experience difficulty in working with an ATR that has a high false alarm rate, even if the ATR's hit rate is also high, and that numeric expressions of ATR judgment may be undervalued by operators in locating targets.


Sign in / Sign up

Export Citation Format

Share Document