scholarly journals On Agreement Tables with Constant Kappa Values

2014 ◽  
Vol 2014 ◽  
pp. 1-5 ◽  
Author(s):  
Matthijs J. Warrens

Kappa coefficients are standard tools for summarizing the information in cross-classifications of two categorical variables with identical categories, here called agreement tables. When two categories are combined the kappa value usually either increases or decreases. There is a class of agreement tables for which the value of Cohen’s kappa remains constant when two categories are combined. It is shown that for this class of tables all special cases of symmetric kappa coincide and that the value of symmetric kappa is not affected by any partitioning of the categories.

2020 ◽  
pp. oemed-2020-106658
Author(s):  
Mahée Gilbert-Ouimet ◽  
Xavier Trudel ◽  
Karine Aubé ◽  
Ruth Ndjaboue ◽  
Caroline S Duchaine ◽  
...  

ObjectivesThis study assesses the validity of a self-reported mental health problem (MHP) diagnosis as the reason for a work absence of 5 days or more compared with a physician-certified MHP diagnosis related to the same work absence. The potential modifying effect of absence duration on validity is also examined.MethodsA total of 709 participants (1031 sickness absence episodes) were selected and interviewed. Total per cent agreement, Cohen’s kappa, sensitivity and specificity values were calculated using the physician-certified MHP diagnosis related to a given work absence as the reference standard. Stratified analyses of total agreement, sensitivity and specificity values were also examined by duration of work absence (5–20 workdays,>20 workdays).ResultsTotal agreement value for self-reported MHP was 90%. Cohen’s kappa value was substantial (0.74). Sensitivity was 77% and specificity was 95%. Absences of more than 20 workdays had a better sensitivity than absences of shorter duration. A high specificity was observed for both short and longer absence episodes.ConclusionThis study showed high specificity and good sensitivity of self-reported MHP diagnosis compared with physician-certified MHP diagnosis for the same work absence. Absences of longer durations had a better sensitivity.


2021 ◽  
Vol 80 (Suppl 1) ◽  
pp. 983.2-983
Author(s):  
B. Drude ◽  
Ø. Maugesten ◽  
S. G. Werner ◽  
G. R. Burmester ◽  
J. Berger ◽  
...  

Background:Fluorescence Optical Imaging (FOI) utilises the fluorophore indocyanine green (ICG) to reflect enhanced microcirculation in hand and finger joints due to inflammation.Objectives:We wanted to assess the interreader reliability of FOI enhancement in patients with hand osteoarthritis (OA) and psoriatic arthritis (PsA). Furthermore, predefined typical morphologic patterns were included to determine the ability of FOI to discriminate between both diagnoses.Methods:An atlas with example images of grade 0-3 in different joint groups and typical morphologic patterns (‘streaky signals’[1], ‘green/blue nail sign’[2], ‘Werner sign’[3,4], and ‘Bishop’s crozier sign’) of PsA and hand OA was created. Two readers scored all joints in both hands (30 in total) of 20 cases with hand OA and PsA. The cases were randomly mixed and both readers were blinded to diagnosis. Each joint was rated on a semiquantitative scale from 0 to 3 in five different images (PrimaVista Mode (PVM), phase 1, 2 (first and middle image), and 3) during the FOI sequence according to the scoring method FOIAS (fluorescence optical imaging activity score)[1,3]. Interreader reliability on scoring joint enhancement was calculated using linear weighted Cohen’s kappa (κ). Agreement on diagnosis (hand OA vs. PsA) and different morphologic patterns was assessed by calculating (regular) Cohen’s kappa.Results:Overall agreement on scoring joint enhancement (all phases) was substantial (κ = 0.75), with greatest consensus in phase 2 first (κ = 0.75) and lowest agreement in phase 1 (κ = 0.46). Reliability varied in different joint groups (wrist, MCP, (P)IP, DIP), with almost perfect overall agreement on PIP joint affection (κ = 0.81), substantial agreement on wrist (κ = 0.69) and DIP joint affection (κ = 0.63), and moderate agreement on MCP joint affection (κ = 0.49) across all phases. Consensus on morphologic patterns showed overall fair agreement (κ = 0.37) with a similar kappa value on the ability to discriminate between both diagnoses (κ = 0.3).Conclusion:Joint enhancement in FOI can be reliably assessed using a predefined scoring method. The ability of FOI to differentiate between hand OA and PsA seems to be limited. Clearer definition and more training might be needed to better agree on morphologic patterns in FOI.References:[1] Glimm AM, Werner SG, Burmester GR, et al. Ann Rheum Dis. 2016 Mar;75(3):566-570[2] Wiemann O, Werner SG, Langer HE, et al. J Dtsch Dermatol Ges. 2019 Feb;17(2):138-148[3] Werner SG, Langer HE, Ohrndorf S, et al. Ann Rheum Dis. 2012 Apr;71(4):504-510[4] Zeidler H 2019. Fluoreszenzoptische Bildgebung. In: Zeidler H, Michel BA. Differenzialdiagnose rheumatischer Erkrankungen 5. Aufl. Springer, Heidelberg, S. 88-89Disclosure of Interests:Benedict Drude: None declared, Øystein Maugesten: None declared, Stephanie Gabriele Werner: None declared, Gerd Rüdiger Burmester: None declared, Jörn Berger Employee of: Xiralite GmbH, Ida K. Haugen: None declared, Sarah Ohrndorf: None declared


2019 ◽  
Vol 79 (3) ◽  
pp. 558-576 ◽  
Author(s):  
Alexandra De Raadt ◽  
Matthijs J. Warrens ◽  
Roel J. Bosker ◽  
Henk A. L. Kiers

Cohen’s kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen’s kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data under two missing data mechanisms—namely, missingness completely at random and a form of missingness not at random. The kappa coefficient considered in Gwet ( Handbook of Inter-rater Reliability, 4th ed.) and the kappa coefficient based on listwise deletion of units with missing ratings were found to have virtually no bias and mean squared error if missingness is completely at random, and small bias and mean squared error if missingness is not at random. Furthermore, the kappa coefficient that treats missing ratings as a regular category appears to be rather heavily biased and has a substantial mean squared error in many of the simulations. Because it performs well and is easy to compute, we recommend to use the kappa coefficient that is based on listwise deletion of missing ratings if it can be assumed that missingness is completely at random or not at random.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Renata Baronaite ◽  
Merete Engelhart ◽  
Troels Mørk Hansen ◽  
Gorm Thamsborg ◽  
Hanne Slott Jensen ◽  
...  

Anti-nuclear antibodies (ANA) have traditionally been evaluated using indirect fluorescence assays (IFA) with HEp-2 cells. Quantitative immunoassays (EIA) have replaced the use of HEp-2 cells in some laboratories. Here, we evaluated ANA in 400 consecutive and unselected routinely referred patients using IFA and automated EIA techniques. The IFA results generated by two independent laboratories were compared with the EIA results from antibodies against double-stranded DNA (dsDNA), from ANA screening, and from tests of the seven included subantigens. The final IFA and EIA results for 386 unique patients were compared. The majority of the results were the same between the two methods (n=325, 84%); however, 8% (n=30) yielded equivocal results (equivocal-negative and equivocal-positive) and 8% (n=31) yielded divergent results (positive-negative). The results showed fairly good agreement, with Cohen’s kappa value of 0.30 (95% confidence interval (CI) = 0.14–0.46), which decreased to 0.23 (95% CI = 0.06–0.40) when the results for dsDNA were omitted. The EIA method was less reliable for assessing nuclear and speckled reactivity patterns, whereas the IFA method presented difficulties detecting dsDNA and Ro activity. The automated EIA method was performed in a similar way to the conventional IFA method using HEp-2 cells; thus, automated EIA may be used as a screening test.


2021 ◽  
Vol 13 (10) ◽  
pp. 4711-4726
Author(s):  
Xiaohua Hao ◽  
Guanghui Huang ◽  
Tao Che ◽  
Wenzheng Ji ◽  
Xingliang Sun ◽  
...  

Abstract. A long-term Advanced Very High Resolution Radiometer (AVHRR) snow cover extent (SCE) product from 1981 until 2019 over China has been generated by the snow research team in the Northwest Institute of Eco-Environment and Resources (NIEER), Chinese Academy of Sciences. The NIEER AVHRR SCE product has a spatial resolution of 5 km and a daily temporal resolution, and it is a completely gap-free product, which is produced through a series of processes such as the quality control, cloud detection, snow discrimination, and gap-filling (GF). A comprehensive validation with reference to ground snow-depth measurements during snow seasons in China revealed the overall accuracy is 87.4 %, the producer's accuracy was 81.0 %, the user's accuracy was 81.3 %, and the Cohen's kappa (CK) value was 0.717. Another validation with reference to higher-resolution snow maps derived from Landsat-5 Thematic Mapper (TM) images demonstrates an overall accuracy of 87.3 %, a producer's accuracy of 86.7 %, a user's accuracy of 95.7 %, and a Cohen's kappa value of 0.695. These accuracies were significantly higher than those of currently existing AVHRR products. For example, compared with the well-known JASMES AVHRR product, the overall accuracy increased approximately 15 %, the omission error dropped from 60.8 % to 19.7 %, the commission error dropped from 31.9 % to 21.3 %, and the CK value increased by more than 114 %. The new AVHRR product is already available at https://doi.org/10.11888/Snow.tpdc.271381 (Hao et al., 2021).


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 3090-3090
Author(s):  
Andrew Spencer ◽  
Tiffany Khong ◽  
Flora Yuen ◽  
Hannah Victoria Giles ◽  
Malgorzata Gorniak ◽  
...  

Introduction: The achievement of minimal residual disease (MRD) negativity is being increasingly recognised as the optimal measure of therapeutic response for both newly diagnosed and relapsed and/or refractory multiple myeloma (MM) patients. Bone marrow (BM) evaluation with either Next Generation Sequencing (NGS) or Next Generation Flow-cytometry (NGF) affords a high level of sensitivity and the attainment of MRD negativity (< 1 in 10-5 MM cells) with either approach is a powerful predictor of superior progression free survival (PFS). Both, however, are limited by the requirement for invasive bone marrow biopsy and the technical limitations imposed by variability in sample quality. Moreover, we and others have demonstrated the presence of significant spatial heterogeneity in MM that increases in the context of disease progression. Against this background we have evaluated a blood-based strategy for disease burden evaluation, Quantitative ImmunoPrecipitation (Mass Spectometry (QIP MS) and Free Light Chain Mass Spectometry (FLC MS) in a uniformly treated cohort of functional high-risk MM patients also undergoing sequential NGF (EuroFlow platform) MRD evaluation. Methods: Newly diagnosed MM patients failing (<partial remission [PR] as best response) front-line bortezomib-based induction therapy were enrolled onto the Australasian Leukaemia and Lymphoma Group (ALLG) MM17 trial (ACTRN12615000934549) evaluating an intensive salvage approach utilising a combination of carfilzomib, thalidomide and dexamethasone (KTd) as re-induction (KTd x 6 cycles) and as post autologous stem cell transplantation (ASCT) consolidation (KTd x 2 cycles followed by Td x10 cycles). NGF MRD status was determined pre-ASCT, post-ASCT and post-KTd consolidation utilising the standardised 8-colour EuroFlow platform. Matched serum samples from the 3 time-points were evaluated in parallel with QIP and FLC MS. Briefly, polyclonal antibodies (anti-IgG, -IgA, -IgM, -total κ, -total λ, free κ and free λ) covalently attached to paramagnetic microparticles were incubated with serum, washed and treated to simultaneously elute and reduce patient immunoglobulins. Light chain mass spectra were generated on a MALDI-TOF-MS system. Concordance between NGF and MS was assessed via the derivation of Cohen's kappa values. Results: Fifty patients were enrolled onto the ALLG MM17 trial. QIP and/or FLC MS identified the serum monoclonal paraprotein (PP) at baseline in all cases (100% sensitivity). Serum samples for MS with matched BM for NGF were available on 33 patients pre-ASCT, 32 post-ASCT and 26 post-KTd consolidation (91 matched samples in total). Sequential MS demonstrated serological complete remission (disappearance of MS baseline detectable monoclonal intact immunoglobulin [PP] and/or FLC) (CRMS) in 11%, 47% and 53% of patients pre-ASCT, post-ASCT and post-KTd, respectively. NGF MRD negativity at the same time points was 39%, 52% and 71% (the latter equivalent to a 50% MRD negativity rate within the original n=50 intention-to-treat population). The Cohen's kappa values for the 3 time-points were 0.21, 0.18 and 0.35 indicating fair to moderate concordance with the best concordance at the post-KTd consolidation time-point and with a Cohen's kappa value for the entire cohort (n=91) of 0.30. The sequential MS demonstrated that 12 patients had discordant disappearance of baseline PP and free light chains (FLC) prior to achieving CRMS. In 11 the FLC disappeared before the PP and in 1 the PP prior to the FLC. The former though to be due to either the FLC falling below the sensitivity of the technique following successful therapy or the presence of 2 sub-clones with differential drug sensitivity, whereas the latter was likely secondary to the persistence of a FLC expressing sub-clone. Post-KTd MS demonstrated good concordance with serological response (Cohen's kappa value = 0.61) but with 18% of patients demonstrating sCR/CR despite persisting MS detectable PP and/or FLC. Conclusion: These preliminary data confirm the utility of QIP MS and FLC MS for the sequential monitoring of tumour burden in HR MM. Concordance with standard monitoring was good with MS detectable disease in some patients with serological sCR/CR consistent with the higher sensitivity of MS. Concordance with NGF was only fair to moderate mandating the future comparison of larger sample sets to better understand the relationship between the 2 methodologies. Disclosures Spencer: Takeda: Consultancy, Honoraria, Research Funding, Speakers Bureau; Secura Bio: Consultancy, Honoraria; Servier: Consultancy, Honoraria; Celgene: Consultancy, Honoraria, Research Funding, Speakers Bureau; Janssen: Consultancy, Honoraria, Research Funding, Speakers Bureau; Amgen: Consultancy, Honoraria, Research Funding; Abbvie: Consultancy, Honoraria; Specialised Therapeutics Australia: Consultancy, Honoraria. Khong:Novartis Oncology: Research Funding. Quach:Janssen: Membership on an entity's Board of Directors or advisory committees; Amgen: Membership on an entity's Board of Directors or advisory committees, Research Funding; Karyopharm: Membership on an entity's Board of Directors or advisory committees; Sanofi: Research Funding; GSK: Membership on an entity's Board of Directors or advisory committees; Celgene: Membership on an entity's Board of Directors or advisory committees, Research Funding; Takeda: Membership on an entity's Board of Directors or advisory committees. Kalff:Amgen: Honoraria; Celgene: Honoraria; pfizer: Honoraria. Reynolds:Novartis Australia: Honoraria; Alfred Health: Employment, Other: Biostatistician for trials funded by the Australian government and Abbvie, Amgen, Celgene, GSK, Janssen-Cilag, Merck, Novartis, Takeda, but sponsored by Alfred Health.; AUSTRALASIAN LEUKAEMIA & LYMPHOMA GROUP (ALLG): Consultancy; Novartis AG: Equity Ownership.


2014 ◽  
Vol 2014 ◽  
pp. 1-6 ◽  
Author(s):  
Matthijs J. Warrens

Cohen’s kappa is a standard tool for the analysis of agreement in a 2 × 2 reliability study. Researchers are frequently only interested in the kappa-value of a sample. Various authors have observed that if two pairs of raters have the same amount of observed agreement, the pair whose marginal distributions are more similar to each other may have a lower kappa-value than the pair with more divergent marginal distributions. Here we present exact formulations of some of these properties. The results provide a better understanding of the 2 × 2 kappa for situations where it is used as a sample statistic.


2018 ◽  
Vol 8 (4) ◽  
pp. 54-57
Author(s):  
Sabina Poudel ◽  
Minu Dhungana ◽  
Rajani Karki ◽  
Prabhat Shrestha

Introduction: Lateral throat form (LTF) is the critical area which has to be recorded properly for obtaining proper retention and stability in complete denture especially in geriatric patients with resorbed ridges. Popular method used for determining LTF is Neil’s method which depends on the forces applied by the floor of mouth when the tongues protrude out. Since the perception of the forces differs among different operators, there are high chances of error in the classification. So, customized instrument was fabricated to prevent this inter-observer variation. The aim of the study was to compare the inter-observer accuracy between Neil’s method of classifi­cation and classification done by customized gauze. Methods and methodology: Total 30 edentulous patients were taken. Two observers measured the LTF depth by customized tool and also by Neil’s method. Cohen’s kappa test was used to evaluate the agreement between two operators in two different classifications. Result: The agreement between the two observers was evaluated by means of Cohen’s kappa value. There was good agreement between observers in proposed classification done by customized tool with kappa value 0.658 and fair inter-observer agreement with kappa value 0.0492. Conclusion: The method of measuring the depth of LTF with fabricated instrument was more accurate and reliable than Neil’s method.


Author(s):  
Miriam Athmann ◽  
Roya Bornhütter ◽  
Nicolaas Busscher ◽  
Paul Doesburg ◽  
Uwe Geier ◽  
...  

AbstractIn the image forming methods, copper chloride crystallization (CCCryst), capillary dynamolysis (CapDyn), and circular chromatography (CChrom), characteristic patterns emerge in response to different food extracts. These patterns reflect the resistance to decomposition as an aspect of resilience and are therefore used in product quality assessment complementary to chemical analyses. In the presented study, rocket lettuce from a field trial with different radiation intensities, nitrogen supply, biodynamic, organic and mineral fertilization, and with or without horn silica application was investigated with all three image forming methods. The main objective was to compare two different evaluation approaches, differing in the type of image forming method leading the evaluation, the amount of factors analyzed, and the deployed perceptual strategy: Firstly, image evaluation of samples from all four experimental factors simultaneously by two individual evaluators was based mainly on analyzing structural features in CapDyn (analytical perception). Secondly, a panel of eight evaluators applied a Gestalt evaluation imbued with a kinesthetic engagement of CCCryst patterns from either fertilization treatments or horn silica treatments, followed by a confirmatory analysis of individual structural features. With the analytical approach, samples from different radiation intensities and N supply levels were identified correctly in two out of two sample sets with groups of five samples per treatment each (Cohen’s kappa, p = 0.0079), and the two organic fertilizer treatments were differentiated from the mineral fertilizer treatment in eight out of eight sample sets with groups of three manure and two minerally fertilized samples each (Cohen’s kappa, p = 0.0048). With the panel approach based on Gestalt evaluation, biodynamic fertilization was differentiated from organic and mineral fertilization in two out of two exams with 16 comparisons each (Friedman test, p < 0.001), and samples with horn silica application were successfully identified in two out of two exams with 32 comparisons each (Friedman test, p < 0.001). Further research will show which properties of the food decisive for resistance to decomposition are reflected by analytical and Gestalt criteria, respectively, in CCCryst and CapDyn images.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Alexandre Maciel-Guerra ◽  
Necati Esener ◽  
Katharina Giebel ◽  
Daniel Lea ◽  
Martin J. Green ◽  
...  

AbstractStreptococcus uberis is one of the leading pathogens causing mastitis worldwide. Identification of S. uberis strains that fail to respond to treatment with antibiotics is essential for better decision making and treatment selection. We demonstrate that the combination of supervised machine learning and matrix-assisted laser desorption ionization/time of flight (MALDI-TOF) mass spectrometry can discriminate strains of S. uberis causing clinical mastitis that are likely to be responsive or unresponsive to treatment. Diagnostics prediction systems trained on 90 individuals from 26 different farms achieved up to 86.2% and 71.5% in terms of accuracy and Cohen’s kappa. The performance was further increased by adding metadata (parity, somatic cell count of previous lactation and count of positive mastitis cases) to encoded MALDI-TOF spectra, which increased accuracy and Cohen’s kappa to 92.2% and 84.1% respectively. A computational framework integrating protein–protein networks and structural protein information to the machine learning results unveiled the molecular determinants underlying the responsive and unresponsive phenotypes.


Sign in / Sign up

Export Citation Format

Share Document