Resisting the Manipulation of Performance Metrics: An Empirical Analysis of The Manipulation-Proof Performance Measure

Author(s):  
Stephen J. Brown ◽  
MaengSoo Kang ◽  
Francis Haeuck In ◽  
Gunhee Lee
2019 ◽  
Author(s):  
Guanglei Cui ◽  
Alan P. Graves ◽  
Eric S. Manas

Relative binding affinity prediction is a critical component in computer aided drug design. Significant amount of effort has been dedicated to developing rapid and reliable in silico methods. However, robust assessment of their performance is still a complicated issue, as it requires a performance measure applicable in the prospective setting and more importantly a true null model that defines the expected performance of random in an objective manner. Although many performance metrics, such as correlation coefficient (r2), mean unsigned error (MUE), and room mean square error (RMSE), are frequently used in the literature, a true and non-trivial null model has yet been identified. To address this problem, here we introduce an interval estimate as an additional measure, namely prediction interval (PI), which can be estimated from the error distribution of the predictions. The benefits of using the interval estimate are 1) it provides the uncertainty range in the predicted activities, which is important in prospective applications; 2) a true null model with well-defined PI can be established. We provide one such example termed Gaussian Random Affinity Model (GRAM), which is based on the empirical observation that the affinity change in a typical lead optimization effort has the tendency to distribute normally N (0, s). Having an analytically defined PI that only depends on the variation in the activities, GRAM should in principle allow us to compare the performance of relative binding affinity prediction methods in a standard way, ultimately critical to measuring the progress made in algorithm development.<br>


Author(s):  
Subrata Roy

The present study seeks to examine the mutual fund performance of the open-ended selected equity schemes of UTI based on multi-index measures as well as conditional multi-index measure. It is observed from the analysis that multi-index measure is able to capture the beta and alpha effects on market adjusted basis and the estimated coefficients is a better representative as compared to the single index measure. When time lagged (lagged at 1 month, 2 months, quarterly and yearly) multi-index measures are applied then the estimated coefficients (alpha & beta) which are market adjusted and time adjusted look more representative than the multi-index measure (without lagged effect). Finally, when we extended the time lagged multi-index measure on a conditional way (conditional on public information variables) then we observe that conditional multi-index lagged measure provides much more representative results in all respects as compared to the all measures after conditioning public information effects.


2014 ◽  
Vol 13 (6) ◽  
pp. 1261
Author(s):  
Francois Van Dyk ◽  
Gary Van Vuuren ◽  
Andre Heymans

The Sharpe ratio is widely used as a performance measure for traditional (i.e., long only) investment funds, but because it is based on mean-variance theory, it only considers the first two moments of a return distribution. It is, therefore, not suited for evaluating funds characterised by complex, asymmetric, highly-skewed return distributions such as hedge funds. It is also susceptible to manipulation and estimation error. These drawbacks have demonstrated the need for new and additional fund performance metrics. The monthly returns of 184 international long/short (equity) hedge funds from four geographical investment mandates were examined over an 11-year period.This study contributes to recent research on alternative performance measures to the Sharpe ratio and specifically assesses whether a scaled-version of the classic Sharpe ratio should augment the use of the Sharpe ratio when evaluating hedge fund risk and in the investment decision-making process. A scaled Treynor ratio is also compared to the traditional Treynor ratio. The classic and scaled versions of the Sharpe and Treynor ratios were estimated on a 36-month rolling basis to ascertain whether the scaled ratios do indeed provide useful additional information to investors to that provided solely by the classic, non-scaled ratios.


2011 ◽  
Vol 3 (1) ◽  
pp. 78-128 ◽  
Author(s):  
Thomas Hellmann ◽  
Veikko Thiele

This paper develops a multitask model where employees make choices between their assigned standard tasks, for which the firm has a performance measure and provides incentives, and privately observed innovation opportunities that fall outside of the performance metrics, and require ex post bargaining. If innovations are highly firm specific, firms provide lower-powered incentives for standard tasks to encourage more innovation, yet in equilibrium employees undertake too few innovations. The opposite occurs if innovations are less firm specific. We also investigate the effectiveness of several possibilities to encourage innovation, such as tolerance for failure, stock-based compensation, and the allocation of intellectual property rights. (JEL D21, J33, M12, O31, O34)


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Tiffany O Sheehan ◽  
Nicolle W Davis ◽  
Yi Guo ◽  
Debra Lynch Kelly ◽  
Saun-joo Yoon ◽  
...  

Background: Implementation of evidence-based performance metrics drive standardized care and improve patient outcomes. Limited performance metrics have been developed for implementation in the aneurysmal subarachnoid hemorrhage (aSAH) population. Timely aneurysm repair following an aSAH is associated with rebleeding prevention and mortality. The purpose of this study was to evaluate time to aneurysm repair as a candidate performance metric by testing a model that includes hospital and patient characteristics as predictors of time to aneurysm repair and mortality, with time to aneurysm repair as a potential influence on these relationships in aSAH. Methods: A retrospective, cross-sectional analysis of patient discharge data from 2014 in the state of Florida was conducted. Data were derived from The Agency for Healthcare Research and Quality, HealthCare Utilization Project, State Inpatient Dataset, and the American Hospital Association Annual Survey. Patients with a primary ICD-9 diagnosis of aSAH and principle procedure of clipping or coiling were included (n=387). The study outcome was in-hospital mortality. Independent variables were level of stroke center, age, race, sex, and type of aneurysm repair. Hierarchical logistic regression was used to estimate the probability of in-hospital death. Results: Patients who underwent endovascular repair of an aneurysm were more likely to be treated in <24 hours compared to those undergoing aneurysm clipping (OR = 0.54, CI = .35-.84, p =0.01). Patients treated at a comprehensive stroke center (CSC) had a 72% reduction in odds of death compared to those treated at primary stroke centers (OR =0.28, CI = 0.10-0.77, p =0.01), controlling for disease severity and comorbidity. Time to aneurysm repair was not significantly associated with mortality and did not influence the relationship between hospital and patient characteristics and mortality. Conclusions: Treatment at a certified CSC was the only significant predictor of surviving aSAH. Time to aneurysm repair did not influence the relationship between hospital and patient characteristics associated with mortality. Further research is needed to identify appropriate measures and to define what should be tracked for performance in the aSAH population.


2010 ◽  
Vol 25 (4) ◽  
pp. 1307-1314 ◽  
Author(s):  
Keith F. Brill ◽  
Matthew Pyle

Abstract Critical performance ratio (CPR) expressions for the eight conditional probabilities associated with the 2 × 2 contingency table of outcomes for binary (dichotomous “yes” or “no”) forecasts are derived. Two are shown to be useful in evaluating the effects of hedging as it approaches random change. The CPR quantifies how the probability of detection (POD) must change as frequency bias changes, so that a performance measure (or conditional probability) indicates an improved forecast for a given value of frequency bias. If yes forecasts were to be increased randomly, the probability of additional correct forecasts (hits) is given by the detection failure ratio (DFR). If the DFR for a performance measure is greater than the CPR, the forecast is likely to be improved by the random increase in yes forecasts. Thus, the DFR provides a benchmark for the CPR in the case of frequency bias inflation. If yes forecasts are decreased randomly, the probability of removing a hit is given by the frequency of hits (FOH). If the FOH for a performance measure is less than the CPR, the forecast is likely to be improved by the random decrease in yes forecasts. Therefore, the FOH serves as a benchmark for the CPR if the frequency bias is decreased. The closer the FOH (DFR) is to being less (greater) than or equal to the CPR, the more likely it may be to enhance the performance measure by decreasing (increasing) the frequency bias. It is shown that randomly increasing yes forecasts for a forecast that is itself better than a randomly generated forecast can improve the threat score but is not likely to improve the equitable threat score. The equitable threat score is recommended instead of the threat score whenever possible.


2009 ◽  
Vol 21 (1) ◽  
pp. 125-149 ◽  
Author(s):  
Shane S. Dikolli ◽  
Christian Hofmann ◽  
Susan L. Kulp

ABSTRACT: This study uses principal-agent analysis to investigate how the principal's use of performance measures in the agent's compensation contract are affected by (1) links between performance measures and (2) substitute and complementary characteristics of an agent's efforts. We show that the directional effect of changes in performance measure interrelations on linear incentive weights depends on how the agent's tasks interact with each other (i.e., substitute or complementary interactions). For example, increases in performance measure interrelations do not necessarily imply higher incentive weights on more sensitive and precise performance measures. If efforts are substitutes for each other, the costs of effort are relatively high and the principal induces lower levels of total effort by offering lower incentives. We also show that differences in the combination of performance measure interrelations and effort interactions affect profits in distinctly different ways. When efforts are substitutes for each other, increases in the sensitivities of profit to the other performance metrics (i.e., increased interrelations), and thus to effort, may actually lead to lower profits.


2019 ◽  
Vol 8 (3) ◽  
pp. 1723-1731 ◽  

Tuning multi-parameter and parameter optimization in Information Retrieval has been a huge area of research and development, especially with BM25F scoring functions having a 2F+1 feature with F fields in the documents. The scoring and ranking function conventionally uses multiple input parameters, to augment the quality of results even at the value of huge calculation time. The searching and ranking documents in the medical literature encompass high recall rates, which are difficult to satisfy with multiple input parameters. The performance of the BM25F depends upon the choice of these F parameters. Particle Swarm Optimization (PSO) searches through the solution- space independently and discovers an optimal solution as opposed to improving and optimizing the gradient; henceforth it can straightforward optimize Mean Average Precision (MAP) a non-differentiable function. In this paper, the usage of PSO to tune multi-parameters is proposed to deal with the gaps in BM25Fscoring function. Also, the advantage of the proposed technique by directly optimizing the MAP has been discussed. Experimental results of quantitative performance metrics MAP and Mean Reciprocal Rank of the proposed PSO-optimized BM25F and most recent ranking algorithms have been compared. The performance measure results demonstrate that the proposed PSO-optimized BM25F performance measure outclasses the standard ranking methods for the OHSUMED data set


Author(s):  
Robin Mathews ◽  
Gregg C Fonarow ◽  
Shuang Li ◽  
Eric D Peterson ◽  
John S Rumsfeld ◽  
...  

Background: ACTION Registry-GWTG (ARG) is a clinical registry and Quality Improvement (QI) program co-sponsored by the American College of Cardiology and the American Heart Association, designed to measure and improve the treatment and outcomes of patients with acute myocardial infarction (AMI). However, it is unknown whether hospital participation in ARG is associated with better performance on publicly reported AMI quality metrics and 30-day outcomes. Methods: Using Hospital Compare, we matched hospitals participating in ARG from 2007-2010 to non-ARG participating hospitals based on teaching status, hospital size, percutaneous coronary intervention capability, and composite adherence to AMI performance measures in 2007. We then used linear mixed modeling to compare 2010 performance measure adherence, 30-day mortality, and all-cause readmission among ARG and non-ARG hospitals. As secondary analyses, we repeated the matching process without using baseline adherence to AMI measures and also stratified the hospitals according to duration of ARG participation and level of baseline performance. Results: We successfully matched 502 hospitals participating in ARG to 502 non-ARG hospitals. Adherence to AMI process measures was very high overall with minimal differences between ARG and non-ARG hospitals for most performance measures. In pairwise mixed modeling, ARG hospitals were more likely to achieve primary PCI within 90 minutes, though the absolute difference was small (Table). Overall, 30-day mortality and readmission rates were similar among ARG and non-ARG centers. Results were consistent whether hospitals were matched based on baseline performance or if centers were stratified by duration of ARG participation. Conclusion: Hospitals across the U.S. report very high achievement rates for nearly all current performance measures for AMI care. These data suggest ongoing local and national QI efforts are having success within and beyond the ARG program. More sensitive process and outcome performance metrics with longer term outcomes may be needed to differentiate quality of care and drive the next phase of improvement in outcomes after acute coronary syndrome.


Sign in / Sign up

Export Citation Format

Share Document