A risk stratification approach for improved interpretation of diagnostic accuracy statistics

Mapping Intimacies ◽

10.1101/080366 ◽

2016 ◽

Author(s):

Hormuzd A. Katki ◽

Mark Schiffman

Keyword(s):

Public Health ◽

Risk Stratification ◽

Diagnostic Accuracy ◽

Area Under The Curve ◽

Population Based ◽

Screening Tests ◽

Predictive Values ◽

The Usa ◽

Relative Gains ◽

Youden’S Index

AbstractDiagnostic accuracy statistics, including predictive values, risk-differences, Youden’s index and Area Under the Curve (AUC), assess the promise of novel biomarkers proposed as diagnostic tests. We reinterpret these statistics in light of risk-stratification (how well a biomarker separates those at higher risk from those at lower risk) to better understand their implications for public-health programs. We introduce an intuitively simple statistic, Mean Risk Stratification (MRS): the average change in risk (pre-test vs. post-test) revealed for tested individuals. High MRS implies better risk separation achieved by testing. MRS demonstrates that conventional predictive values can mislead because they do not account for disease prevalence or test-positivity rates. Little risk-stratification is possible for rare diseases, demonstrating a “high-bar” to justify population-based screening. Importantly, we demonstrate that the risk-difference, Youden’s index, and AUC measure only multiplicative relative gains in risk-stratification: AUC=0.6 achieves only 20% of maximum risk-stratification (AUC=0.9 achieves 80%). However, large relative gains in risk-stratification might not imply large absolute gains if disease is rare or if the test is rarely positive. We illustrate MRS by our experience comparing the performance of cervical cancer screening tests in China vs. the USA. The test with the worst AUC=0.72 in China (visual inspection with ascetic acid) provides twice the risk-stratification of the test with best AUC=0.83 in the USA (human papillomavirus and Pap cotesting) because China has three times more cervical precancer/cancer. MRS could be routinely calculated to better understand the clinical/public-health implications of standard diagnostic accuracy statistics.

Download Full-text

Diagnostic Accuracy and Detection Rate of Glaucoma Screening with Optic Disk Photos, Optical Coherence Tomography Images, and Telemedicine

Journal of Clinical Medicine ◽

10.3390/jcm11010216 ◽

2021 ◽

Vol 11 (1) ◽

pp. 216

Author(s):

Alfonso Anton ◽

Karen Nolivos ◽

Marta Pazos ◽

Gianluca Fatti ◽

Miriam Eleonora Ayala ◽

...

Keyword(s):

Optical Coherence Tomography ◽

Diagnostic Accuracy ◽

Detection Rate ◽

Screening Program ◽

Rate Sensitivity ◽

High Specificity ◽

Population Based ◽

Optic Disk ◽

Optical Coherence ◽

Predictive Values

Purpose: The aim of this study was to evaluate the diagnostic accuracy of optical coherence tomography (OCT) and retinography in the detection of glaucoma through a telemedicine program. Methods: A population-based sample of 4113 persons was randomly selected. The screening examination included a fundus photograph and OCT images. Images were evaluated on a deferred basis. All participants were then invited to a complete glaucoma examination, including gonioscopy, visual field, and dilated fundus examination. The detection rate, sensitivity, specificity, and positive and negative predictive values were calculated. Results: We screened 1006 persons. Of these, 201 (19.9%) were classified as glaucoma suspects; 20.4% were identified only by retinographs, 11.9% only by OCT images, and 46.3% by both. On ophthalmic examination at the hospital (n = 481), confirmed glaucoma was found in 58 (12.1%), probable glaucoma in 76 (15.8%), and ocular hypertension in 10 (2.1%), and no evidence of glaucoma was found in 337 (70.0%). The detection rate for confirmed or probable glaucoma was 9.2%. Sensitivity ranged from 69.4% to 86.2% and specificity from 82.1% to 97.4%, depending on the definition applied. Conclusions: The combination of OCT images and fundus photographs yielded a detection rate of 9.2% in a population-based screening program with moderate sensitivity, high specificity, and predictive values of 84–96%.

Download Full-text

Comparative Diagnostic Accuracy of the ACE-III, MIS, MMSE, MoCA, and RUDAS for Screening of Alzheimer Disease

Dementia and Geriatric Cognitive Disorders ◽

10.1159/000469658 ◽

2017 ◽

Vol 43 (5-6) ◽

pp. 237-246 ◽

Cited By ~ 31

Author(s):

Jordi A. Matías-Guiu ◽

María Valles-Salgado ◽

Teresa Rognoni ◽

Frank Hamre-Gil ◽

Teresa Moreno-Ramos ◽

...

Keyword(s):

Alzheimer Disease ◽

Diagnostic Accuracy ◽

Area Under The Curve ◽

Cross Sectional Study ◽

Screening Tests ◽

Cross Sectional ◽

Diagnostic Capacity ◽

State Examination ◽

Diagnostic Properties ◽

Addenbrooke’S Cognitive Examination

Background: Our aim was to evaluate and compare the diagnostic properties of 5 screening tests for the diagnosis of mild Alzheimer disease (AD). Methods: We conducted a prospective and cross-sectional study of 92 patients with mild AD and of 68 healthy controls from our Department of Neurology. The diagnostic properties of the following tests were compared: Mini-Mental State Examination (MMSE), Addenbrooke's Cognitive Examination III (ACE-III), Memory Impairment Screen (MIS), Montreal Cognitive Assessment (MoCA), and Rowland Universal Dementia Assessment Scale (RUDAS). Results: All tests yielded high diagnostic accuracy, with the ACE-III achieving the best diagnostic properties. The area under the curve was 0.897 for the ACE-III, 0.889 for the RUDAS, 0.874 for the MMSE, 0.866 for the MIS, and 0.856 for the MoCA. The Mini-ACE score from the ACE-III showed the highest diagnostic capacity (area under the curve 0.939). Memory scores of the ACE-III and of the RUDAS showed a better diagnostic accuracy than those of the MMSE and of the MoCA. All tests, especially the ACE-III, conveyed a higher diagnostic accuracy in patients with full primary education than in the less educated group. Implementing normative data improved the diagnostic accuracy of the ACE-III but not that of the other tests. Conclusions: The ACE-III achieved the highest diagnostic accuracy. This better discrimination was more evident in the more educated group.

Download Full-text

Accuracy of a 7-Item Patient-Reported Stand-Alone Tool for Periodontitis Screening

Journal of Clinical Medicine ◽

10.3390/jcm10020287 ◽

2021 ◽

Vol 10 (2) ◽

pp. 287

Author(s):

Caroline Sekundo ◽

Tobias Bölk ◽

Olivier Kalmus ◽

Stefan Listl

Keyword(s):

Diagnostic Accuracy ◽

Area Under The Curve ◽

German Society ◽

Screening Tools ◽

University Hospital ◽

Predictive Values ◽

Periodontal Inflammation ◽

Specificity And Sensitivity ◽

Patient Reported ◽

Sensitivity Specificity

Periodontitis is interrelated with various other chronic diseases. Recent evidence suggests that treatment of periodontitis improves glycemic control in diabetes patients and reduces the costs of diabetes treatment. So far, however, screening for periodontitis in non-dental settings has been complicated by a lack of easily applicable and reliable screening tools which can be applied by non-dental professionals. The purpose of this study was to assess the diagnostic accuracy of a short seven-item tool developed by the German Society for Periodontology (DG PARO) to screen for periodontitis by means of patient-reported information. A total of 88 adult patients filled in the patient-reported Periodontitis Risk Score (pPRS; range: 0 points = lowest periodontitis risk; 20 points = very high periodontitis risk) questionnaire before dental check-up at Heidelberg University Hospital. Subsequent clinical assessments according to Periodontal Screening and Recording (PSR®) were compared with pPRS scores. The diagnostic accuracy of pPRS at different cutoff values was assessed according to sensitivity, specificity, positive, and negative predictive values, as well as Receiver-Operator-Characteristic curves, Area Under the Curve (AUC), and logistic regression analysis. According to combined specificity and sensitivity (AUC = 0.86; 95%-CI: 0.76–0.95), the diagnostic accuracy of the pPRS for detecting periodontal inflammation (PSR® ≥ 3) was highest for a pPRS cutoff distinguishing between pPRS scores < 7 vs. ≥ 7. Patients with pPRS scores ≥ 7 had a 36.09 (95%-CI: 9.82–132.61) times higher chance of having a PSR® ≥ 3 than patients with scores < 7. In conclusion, the pPRS may be considered an appropriately accurate stand-alone tool for the screening for periodontitis.

Download Full-text

Diagnostic Accuracy of Upper Cervical Spine Instability Tests: A Systematic Review

Physical Therapy ◽

10.2522/ptj.20130186 ◽

2013 ◽

Vol 93 (12) ◽

pp. 1686-1695 ◽

Cited By ~ 27

Author(s):

Nathan Hutting ◽

Gwendolijne G.M. Scholten-Peeters ◽

Veerle Vijverman ◽

Martin D.M. Keesenberg ◽

Arianne P. Verhagen

Keyword(s):

Systematic Review ◽

Cervical Spine ◽

Diagnostic Accuracy ◽

Data Extraction ◽

Screening Tests ◽

Upper Cervical Spine ◽

Likelihood Ratios ◽

Cervical Spine Instability ◽

Predictive Values ◽

Upper Cervical

BackgroundPatients with neck pain, headache, torticollis, or neurological signs should be screened carefully for upper cervical spine instability, as these conditions are “red flags” for applying physical therapy interventions. However, little is known about the diagnostic accuracy of upper cervical spine instability tests.PurposeThe purpose of this study was to evaluate the diagnostic accuracy of upper cervical spine instability screening tests in patients or people who are healthy.Data SourcesPubMed, CINAHL, EMBASE, and RECAL Legacy databases were searched from their inception through October 2012.Study SelectionStudies were included that assessed the diagnostic accuracy of upper cervical instability screening tests in patients or people who are healthy and in which sensitivity and specificity were reported or could be calculated using a 2 × 2 table.Data Extraction and Quality AssessmentTwo reviewers independently performed data extraction and the methodological quality assessment using the QUADAS-2.Data SynthesisDepending on heterogeneity, statistical pooling was performed. All diagnostic parameters (sensitivity, specificity, predictive values, and likelihood ratios) were recalculated, if possible.ResultsFive studies were included in this systematic review. Statistical pooling was not possible due to clinical and statistical heterogeneity. Specificity of 7 tests was sufficient, but sensitivity varied. Predictive values were variable. Likelihood ratios also were variable, and, in most cases, the confidence intervals were large.LimitationsThe included studies suffered from several biases. None of the studies evaluated upper cervical spine instability tests in patients receiving primary care.ConclusionsThe membranes tests had the best diagnostic accuracy, but their applicability as a test for diagnosing upper cervical spine instability in primary care has yet to be confirmed.

Download Full-text

A Preliminary Study on the Ability of the Trypsin-Like Peptidase Activity Assay Kit to Detect Periodontitis

Dentistry Journal ◽

10.3390/dj8030098 ◽

2020 ◽

Vol 8 (3) ◽

pp. 98

Author(s):

Masanori Iwasaki ◽

Michihiko Usui ◽

Wataru Ariyoshi ◽

Keisuke Nakashima ◽

Yoshie Nagai-Yoshioka ◽

...

Keyword(s):

Diagnostic Accuracy ◽

Operating Characteristic ◽

Area Under The Curve ◽

Population Based ◽

Case Definition ◽

Peptidase Activity ◽

Activity Assay ◽

Severe Periodontitis ◽

Study Population ◽

Preliminary Study

This study aimed to explore whether the Trypsin-Like Peptidase Activity Assay Kit (TLP-AA-Kit), which measures the activity of N-benzoyl-dl-arginine peptidase (trypsin-like peptidase), can be used as a reliable tool for periodontitis detection in population-based surveillance. In total, 105 individuals underwent a full-mouth periodontal examination and provided tongue swabs as specimens for further analyses. The results of the TLP-AA-Kit were scored between 1 and 5; higher scores indicated higher trypsin concentrations. Receiver operating characteristic analyses were used to evaluate the predictive validity of the TLP-AA-Kit, where the periodontitis case definition provided by the Centers for Disease Control/American Academy of Periodontology served as the reference. Severe and moderate periodontitis were identified in 4.8% and 16.2% of the study population, respectively. The TLP-AA-Kit showed high diagnostic accuracy for severe periodontitis, with an area under the curve of 0.93 (95% confidence interval = 0.88–0.99). However, the diagnostic accuracy of the TLP-AA-Kit for moderate/severe periodontitis was not reliable. While further studies are necessary to validate our results, the results provided herein highlight the potential of the TLP-AA-Kit as a useful tool for the detection of periodontitis, particularly in severe cases, for population-based surveillance.

Download Full-text

Circulating miRNA as Biomarkers for Colorectal Cancer Diagnosis and Liver Metastasis

Diagnostics ◽

10.3390/diagnostics11020341 ◽

2021 ◽

Vol 11 (2) ◽

pp. 341

Author(s):

Farah J. Nassar ◽

Zahraa S. Msheik ◽

Maha M. Itani ◽

Remie El Helou ◽

Ruba Hadla ◽

...

Keyword(s):

Colorectal Cancer ◽

Liver Metastasis ◽

Diagnostic Accuracy ◽

Area Under The Curve ◽

Stage Iv ◽

Screening Tests ◽

Operating Characteristics ◽

Diagnostic Biomarkers ◽

Circulating Mirna ◽

Non Invasive

Colorectal cancer (CRC) is the second leading cause of cancer deaths worldwide. Stage IV CRC patients have poor prognosis with a five-year survival rate of 14%. Liver metastasis is the main cause of mortality in CRC patients. Since current screening tests have several drawbacks, effective stable non-invasive biomarkers such as microRNA (miRNA) are needed. We aim to investigate the expression of miRNA (miR-21, miR-19a, miR-23a, miR-29a, miR-145, miR-203, miR-155, miR-210, miR-31, and miR-345) in the plasma of 62 Lebanese Stage IV CRC patients and 44 healthy subjects using RT-qPCR, as well as to evaluate their potential for diagnosis of advanced CRC and its liver metastasis using the Receiver Operating Characteristics (ROC) curve. miR-21, miR-145, miR-203, miR-155, miR-210, miR-31, and miR-345 were significantly upregulated in the plasma of surgery naïve CRC patients when compared to healthy individuals. We identified two panels of miRNA that could be used for diagnosis of Stage IV CRC (miR-21 and miR-210) with an area under the curve (AUC) of 0.731 and diagnostic accuracy of 69% and liver metastasis (miR-210 and miR-203) with an AUC = 0.833 and diagnostic accuracy of 72%. Panels of specific circulating miRNA, which require further validation, could be potential non-invasive diagnostic biomarkers for CRC and liver metastasis.

Download Full-text

Chlamydia trachomatis screening in resource-limited countries – Comparison of diagnostic accuracy of 3 different assays

The Journal of Infection in Developing Countries ◽

10.3855/jidc.10442 ◽

2018 ◽

Vol 12 (09) ◽

pp. 733-740

Author(s):

Jelena Zivadin Tosic-Pajic ◽

Predrag Sazdanovic ◽

Marija Sorak ◽

Jelena Cukic ◽

Aleksandra Arsovic ◽

...

Keyword(s):

Chlamydia Trachomatis ◽

Diagnostic Accuracy ◽

Statistical Significance ◽

Acute Infection ◽

False Negative ◽

Screening Tests ◽

Rt Pcr ◽

Serum Igg ◽

Significant Difference ◽

Youden’S Index

Introduction: Commercially available assays were evaluated in order to determine diagnostic accuracy of Chlamydia trachomatis specific tests for screening. Methods: The study included 225 sexually active men and women, who were tested for genital chlamydial infection in Institute of Public Health Kragujevac. Three screening tests were used: direct immunofluorescence (DIF) and rapid lateral immunochromatographic test (RT) for qualitative detection of chlamydial antigens and immunoenzyme (ELISA) test for detection of serum levels of anti-chlamydial IgA and IgG antibodies. Diagnostic efficiency of these tests were determined in relation to results obtained by RT-PCR method. Results: Statistical significance between the results obtained by RT-PCR as a gold standard and DIF, RT and ELISA were analyzed using chi-square (χ2) test. Statistical analysis showed a significant difference between RT-PCR and analyzed screening tests: DIF (χ2 = 303; p < 0.001), RT (χ2 = 4.19; p = 0.041), serum IgA (χ2 = 4.19; p = 0.041) and serum IgG (χ2 = 67; p < 0.001) which indicates poor agreement between these tests. Large numbers of false positive (FP) and false negative (FN) results were observed for all tested assays. According to Youden’s index, serum IgG and DIF testing demonstrated the most-balanced sensitivity-specificity rate. RT assay exhibits the highest expanded Youden’s index, as well as the best overall diagnostic accuracy. Conclusions: None of evaluated screening tests can be recommended as individual method for the diagnosis of acute infection. We suppose that RT-PCR is unlikely to be a cost-effective screening strategy within the Serbian health system.

Download Full-text

Comparison among Different Screening Tests for Diagnosis of Adolescent Hypertension

ISRN Hypertension ◽

10.5402/2013/107915 ◽

2013 ◽

Vol 2013 ◽

pp. 1-3 ◽

Cited By ~ 3

Author(s):

Silvia Totaro ◽

Franco Rabbia ◽

Ivana Rabbone ◽

Michele Covella ◽

Elena Berra ◽

...

Keyword(s):

Diagnostic Accuracy ◽

Population Based ◽

Ease Of Use ◽

Screening Tests ◽

Screening Methods ◽

Specific Training ◽

Cross Sectional ◽

Population Based Study ◽

Hypertension Diagnosis ◽

Childhood Hypertension

The diagnosis of childhood hypertension based upon percentile tables proposed by the international guidelines is complex and often a cause of underdiagnosis, particularly among physicians who have not had specific training in the field of adolescent hypertension. The use of a simple and accurate screening test may improve hypertension diagnosis in adolescents. The aim of our study is to compare the different screening methods currently used in the literature to improve the diagnosis of childhood hypertension. We have conducted a cross-sectional population-based study of 1412 Caucasian adolescents among students of public junior high schools of Turin, Italy. In this population we have defined the hypertensive status with four different screening tests: BPHR, Somu's equations, Ardissino, and Kaelber methods. Finally, we compared the diagnostic accuracy of the 4 screening tests with the gold standard. Our analysis identifies in BPHR the test which combines ease of use and diagnostic accuracy.

Download Full-text

Diagnostic Accuracy of Transient Ischemic Attack from Physician Claims

Canadian Journal of Neurological Sciences / Journal Canadien des Sciences Neurologiques ◽

10.1017/cjn.2016.454 ◽

2017 ◽

Vol 44 (4) ◽

pp. 397-403

Author(s):

Jodi D. Edwards ◽

Mieke Koehoorn ◽

Lara A. Boyd ◽

Boris Sobolev ◽

Adrian R. Levy

Keyword(s):

Diagnostic Accuracy ◽

Transient Ischemic Attack ◽

Hospital Admissions ◽

Population Based ◽

Discharge Data ◽

Predictive Values ◽

Physician Visits ◽

Physician Billing ◽

Ischemic Attack ◽

The Impact

AbstractBackground:Hospitalization data underestimate the occurrence of transient ischemic attack (TIA). As TIA is frequently diagnosed in primary care, methodologies for the accurate ascertainment of a TIA from physician claims data are required for surveillance and health systems planning in this population. The present study evaluated the diagnostic accuracy of multiple algorithms for TIA from a longitudinal population-based physician billing database.Methods:Population-based administrative data from the province of British Columbia were used to identify the base population (1992–2007;N=102,492). Using discharge records for hospital admissions for acute ischemic stroke with a recent (<90 days) TIA as the reference standard, we performed receiver-operating characteristic analyses to calculate sensitivity, specificity, positive and negative predictive values and overall accuracy, and to compare area under the curve for each physician billing algorithm. To evaluate the impact of different case definitions on population-based TIA burden, we also estimated the annual TIA occurrence associated with each algorithm.Results:Physician billing algorithms showed low to moderate sensitivity, with the algorithm for two consecutive physician visits within 90 days showing the highest sensitivity at 37.7% (CI95%=37.4–38.1). All algorithms demonstrated high specificity and moderate to high overall accuracy, resulting in low positive predictive values (≤5%), low discriminability (0.53–0.57) and high false positive rates (1 – specificity). Population-based estimates of TIA occurrence were comparable to prior studies and declined over time.Conclusions:Physician billing data have insufficient sensitivity to identify TIAs but may be used in combination with hospital discharge data to improve the accuracy of estimating the population-based occurrence of TIAs.

Download Full-text

The Performance of a Calcaneal Quantitative Ultrasound Device, CM-200, in Stratifying Osteoporosis Risk among Malaysian Population Aged 40 Years and Above

Diagnostics ◽

10.3390/diagnostics10040178 ◽

2020 ◽

Vol 10 (4) ◽

pp. 178 ◽

Cited By ~ 2

Author(s):

Shaanthana Subramaniam ◽

Chin-Yi Chan ◽

Ima Nirwana Soelaiman ◽

Norazlina Mohamed ◽

Norliza Muhammad ◽

...

Keyword(s):

Risk Stratification ◽

Quantitative Ultrasound ◽

Bone Health ◽

Area Under The Curve ◽

Roc Curves ◽

T Score ◽

Calcaneal Quantitative Ultrasound ◽

Sensitivity Specificity ◽

Osteoporosis Risk ◽

Youden’S Index

Background: Calcaneal quantitative ultrasound (QUS) is widely used in osteoporosis screening, but the cut-off values for risk stratification remain unclear. This study validates the performance of a calcaneal QUS device (CM-200) using dual-energy X-ray absorptiometry (DXA) as the reference and establishes a new set of cut-off values for CM-200 in identifying subjects with osteoporosis. Methods: The bone health status of Malaysians aged ≥40 years was assessed using CM-200 and DXA. Sensitivity, specificity, area under the curve (AUC) and the optimal cut-off values for risk stratification of CM-200 were determined using receiver operating characteristic (ROC) curves and Youden’s index (J). Results: From the data of 786 subjects, CM-200 (QUS T-score <−1) showed a sensitivity of 82.1% (95% CI: 77.9–85.7%), specificity of 51.5% (95% CI: 46.5–56.6%) and AUC of 0.668 (95% CI: 0.630–0.706) in identifying subjects with suboptimal bone health (DXA T-score <−1) (p < 0.001). At QUS T-score ≤−2.5, CM-200 was ineffective in identifying subjects with osteoporosis (DXA T-score ≤−2.5) (sensitivity 14.4% (95% CI: 8.1–23.0%); specificity 96.1% (95% CI: 94.4–97.4%); AUC 0.553 (95% CI: 0.488–0.617); p > 0.05). Modified cut-off values for the QUS T-score improved the performance of CM-200 in identifying subjects with osteopenia (sensitivity 67.7% (95% CI: 62.8–72.3%); specificity 72.8% (95% CI: 68.1–77.2%); J = 0.405; AUC 0.702 (95% CI: 0.666–0.739); p < 0.001) and osteoporosis (sensitivity 79.4% (95% CI: 70.0–86.9%); specificity 61.8% (95% CI: 58.1–65.5%); J = 0.412; AUC 0.706 (95% CI: 0.654–0.758); p < 0.001). Conclusion: The modified cut-off values significantly improved the performance of CM-200 in identifying individuals with osteoporosis. Since these values are device-specific, optimization is necessary for accurate detection of individuals at risk for osteoporosis using QUS.

Download Full-text