Incorporating Misclassification Error in Skill Assessment

2005 ◽  
Vol 133 (11) ◽  
pp. 3382-3392 ◽  
Author(s):  
William Briggs ◽  
Matt Pocernich ◽  
David Ruppert

Abstract It is desirable to account for misclassification error of meteorological observations so that the true skill of the forecast can be assessed. Errors in observations can occur, among other places, in pilot reports of icing and in tornado spotting. Not accounting for misclassification error gives a misleading picture of the forecast’s true performance. An extension to the climate skill score test developed in Briggs and Ruppert is presented to account for possible misclassification error of the meteorological observation. This extension supposes a statistical misclassification-error model where “gold standard” data, or expert opinion, is available to characterize the misclassification-error characteristics of the observation. These model parameters are then inserted into the Briggs and Ruppert skill score for which a statistical test of significance can be performed.

2016 ◽  
Author(s):  
Wouter Greuell ◽  
Wietse H. P. Franssen ◽  
Hester Biemans ◽  
Ronald W. A. Hutjes

Abstract. Seasonal predictions can be exploited among others to optimize hydropower energy generation, navigability of rivers and irrigation management to decrease crop yield losses. This paper is the first of two papers dealing with a model-based system built to produce seasonal hydrological forecasts (WUSHP: Wageningen University Seamless Hydrological Prediction system), applied here to Europe. The present paper presents the development and the skill evaluation of the system. In WUSHP hydrology is simulated by running the Variable Infiltration Capacity (VIC) hydrological model with forcing from bias-corrected output of ECMWF's Seasonal Forecasting System 4. The system is probabilistic. For the assessment of skill, we performed hindcast simulations (1981–2010) and a reference simulation, in which VIC was forced by gridded meteorological observations, to generate initial hydrological conditions for the hindcasts and discharge output for skill assessment (pseudo-observations). Skill is analysed with monthly temporal resolution for the entire annual cycle. Using the pseudo-observations and taking the correlation coefficient as metric, hot spots of significant skill in runoff were identified in Fennoscandia (from January to October), the southern part of the Mediterranean (from June to August), Poland, North Germany, Romania and Bulgaria (mainly from November to January) and West France (from December to May). The spatial pattern of skill is fading with increasing lead time but some skill is left at the end of the hindcasts (7 months). On average across the domain, skill in discharge is slightly higher than skill in runoff. This can be explained by the delay between runoff and discharge and the general tendency of decreasing skill with lead time. Theoretical skill as determined with the pseudo-observations was compared to actual skill as determined with real discharge observations from 747 stations. Actual skill is mostly and often substantially less than theoretical skill, which is consistent with a conceptual analysis of the two types of verification. Qualitatively, results are hardly sensitive to the different skill metrics considered in this study (correlation coefficient, ROC area and Ranked Probability Skill Score) but ROC areas tend to be slightly larger for the Below Normal than for the Above Normal tercile.


2020 ◽  
Vol 5 ◽  
pp. 229
Author(s):  
Mark Mummé ◽  
Andy Boyd ◽  
Jean Golding ◽  
John Macleod

This data note describes the linked antenatal and delivery records of the mothers and index children of the Avon Longitudinal Study of Parents and Children (ALSPAC) birth cohort study. These records were extracted from the computerised maternity record system ‘STORK’ used by the two largest NHS trusts in the study catchment area. The STORK database was designed to be populated by midwives and other health professionals during a woman’s pregnancy and shortly after the baby’s birth. These early computer records were initiated in the early 1990s, shortly before the start of enrolment to ALSPAC. At this time the use of electronic medical record systems such as ‘STORK’ was very new, the accuracy of the records has been questioned and little contemporary detailed documentation is available. Small sample spot checks on the accuracy of the information in ‘STORK’ suggests extensive missingness and differences against gold-standard fieldworker abstracted information in some variables; yet high levels of completeness and agreement with gold-standard data in others. Software code was created using STATA (StataCorp LLC) to transform the original CSV (comma-separated values) files into a cohesive and consistent format which was reviewed for data-completeness for its potential use in future research. The cleaned ‘STORK’ records provide health, social and maternity data from the very earliest period of the ALSPAC study in an easily accessible format, which is particularly useful when other sources of data are missing.


2021 ◽  
Vol 893 (1) ◽  
pp. 012047
Author(s):  
R Rahmat ◽  
A M Setiawan ◽  
Supari

Abstract Indonesian climate is strongly affected by El Niño-Southern Oscillation (ENSO) as one of climate-driven factor. ENSO prediction during the upcoming months or year is crucial for the government in order to design the further strategic policy. Besides producing its own ENSO prediction, BMKG also regularly releases the status and ENSO prediction collected from other climate centers, such as Japan Meteorological Agency (JMA) and National Oceanic and Atmospheric Administration (NOAA). However, the skill of these products is not well known yet. The aim of this study is to conduct a simple assessment on the skill of JMA Ensemble Prediction System (EPS) and NOAA Climate Forecast System version 2 (CFSv2) ENSO prediction using World Meteorological Organization (WMO) Standard Verification System for Long Range Forecast (SVS-LRF) method. Both ENSO prediction results also compared each other using Student's t-test. The ENSO predictions data were obtained from the ENSO JMA and ENSO NCEP forecast archive files, while observed Nino 3.4 were calculated from Centennial in situ Observation-Based Estimates (COBE) Sea Surface Temperature Anomaly (SSTA). Both ENSO prediction issued by JMA and NCEP has a good skill on 1 to 3 months lead time, indicated by high correlation coefficient and positive value of Mean Square Skill Score (MSSS). However, the skill of both skills significantly reduced for May-August target month. Further careful interpretation is needed for ENSO prediction issued on this mentioned period.


2011 ◽  
Vol 64 (11) ◽  
pp. 1230-1241 ◽  
Author(s):  
Santiago G. Moreno ◽  
Alex J. Sutton ◽  
A.E. Ades ◽  
Nicola J. Cooper ◽  
Keith R. Abrams
Keyword(s):  

2018 ◽  
Vol 2018 ◽  
pp. 1-10
Author(s):  
Siyu Ji ◽  
Chenglin Wen

Neural network is a data-driven algorithm; the process established by the network model requires a large amount of training data, resulting in a significant amount of time spent in parameter training of the model. However, the system modal update occurs from time to time. Prediction using the original model parameters will cause the output of the model to deviate greatly from the true value. Traditional methods such as gradient descent and least squares methods are all centralized, making it difficult to adaptively update model parameters according to system changes. Firstly, in order to adaptively update the network parameters, this paper introduces the evaluation function and gives a new method to evaluate the parameters of the function. The new method without changing other parameters of the model updates some parameters in the model in real time to ensure the accuracy of the model. Then, based on the evaluation function, the Mean Impact Value (MIV) algorithm is used to calculate the weight of the feature, and the weighted data is brought into the established fault diagnosis model for fault diagnosis. Finally, the validity of this algorithm is verified by the example of UCI-Combined Cycle Power Plant (UCI-ccpp) simulation of standard data set.


2008 ◽  
Vol 23 (5) ◽  
pp. 1022-1031 ◽  
Author(s):  
Marion P. Mittermaier

Abstract Skill is defined as actual forecast performance relative to the performance of a reference forecast. It is shown that the choice of reference (e.g., random or persistence) can affect the perceived performance of the forecast system. Two scores, the equitable threat score (ETS) and the odds ratio benefit skill score (ORBSS), were chosen to show the impact of using a persistence forecast, first using some simple hypothetical scenarios and second for actual forecasts from the Met Office Unified Model (UM) of precipitation, total cloud cover, and visibility during 2006. Overall persistence offers a sterner test of true forecast added value and accuracy, but using a more realistic reference may come at a cost. Using persistence introduces an additional degree of freedom to the skill assessment, which may be rather variable for “weather parameters.” Ultimately, the aim of any forecasting system should be to achieve a substantive separation between the inherent skill of the reference (which represents basic predictability) and the actual forecast.


2004 ◽  
Vol 9 (4) ◽  
pp. 137-144 ◽  
Author(s):  
Dejan Tomaževič ◽  
Boštjan Likar ◽  
Franjo Pernuš
Keyword(s):  

2011 ◽  
Vol 38 (3) ◽  
pp. 1491-1502 ◽  
Author(s):  
Christelle Gendrin ◽  
Primož Markelj ◽  
Supriyanto Ardjo Pawiro ◽  
Jakob Spoerk ◽  
Christoph Bloch ◽  
...  

2021 ◽  
Author(s):  
Qi Jia ◽  
Dezheng Zhang ◽  
Haifeng Xu ◽  
Yonghong Xie

BACKGROUND Traditional Chinese medicine (TCM) clinical records contain the symptoms of patients, diagnoses, and subsequent treatment of doctors. These records are important resources for research and analysis of TCM diagnosis knowledge. However, most of TCM clinical records are unstructured text. Therefore, a method to automatically extract medical entities from TCM clinical records is indispensable. OBJECTIVE Training a medical entity extracting model needs a large number of annotated corpus. The cost of annotated corpus is very high and there is a lack of gold-standard data sets for supervised learning methods. Therefore, we utilized distantly supervised named entity recognition (NER) to respond to the challenge. METHODS We propose a span-level distantly supervised NER approach to extract TCM medical entity. It utilizes the pretrained language model and a simple multilayer neural network as classifier to detect and classify entity. We also designed a negative sampling strategy for the span-level model. The strategy randomly selects negative samples in every epoch and filters the possible false-negative samples periodically. It reduces the bad influence from the false-negative samples. RESULTS We compare our methods with other baseline methods to illustrate the effectiveness of our method on a gold-standard data set. The F1 score of our method is 77.34 and it remarkably outperforms the other baselines. CONCLUSIONS We developed a distantly supervised NER approach to extract medical entity from TCM clinical records. We estimated our approach on a TCM clinical record data set. Our experimental results indicate that the proposed approach achieves a better performance than other baselines.


1991 ◽  
Vol 28 (4) ◽  
pp. 417-428 ◽  
Author(s):  
Pradeep K. Chintagunta ◽  
Dipak C. Jain ◽  
Naufel J. Vilcassim

In analyzing panel data, the issue of heterogeneity across households is an important consideration. If heterogeneity is present but is ignored in the analysis, it will result in biased and inconsistent estimates of the effects of marketing mix variables on brand choice. The authors propose the use of a random effects specification to account for heterogeneity in brand preferences across households in a logit framework. The model parameters are estimated by both parametric and semiparametric approaches. The authors also compare their results with those obtained from logit models in which observed past choice behavioir is used to capture such heterogeneity. The different models are estimated with the IRI saltine crackers dataset. A formal statistical test of the model specifications shows that the semiparametric specification is the most preferred in terms of the overall fit of the model to the data. In addition, that specification predicts best when the models are validated in a holdout sample of households.


Sign in / Sign up

Export Citation Format

Share Document