Evaluation of Erythema Severity in Dermatoscopic Images of Canine Skin: Erythema Index Assessment and Image Sampling Reliability

Blaž Cugmas; Daira Viškere; Eva Štruc; Thierry Olivry

doi:10.3390/s21041285

Evaluation of Erythema Severity in Dermatoscopic Images of Canine Skin: Erythema Index Assessment and Image Sampling Reliability

Sensors ◽

10.3390/s21041285 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1285

Author(s):

Blaž Cugmas ◽

Daira Viškere ◽

Eva Štruc ◽

Thierry Olivry

Keyword(s):

Correlation Coefficient ◽

Sampling Methods ◽

Intraclass Correlation ◽

Skin Lesions ◽

The Other ◽

Rater Reliability ◽

Image Sampling ◽

Spearman Coefficient ◽

Skin Erythema ◽

Erythematous Skin

The regular monitoring of erythema, one of the most important skin lesions in atopic (allergic) dogs, is essential for successful anti-allergic therapy. The smartphone-based dermatoscopy enables a convenient way to acquire quality images of erythematous skin. However, the image sampling to evaluate erythema severity is still done manually, introducing result variability. In this study, we investigated the correlation between the most popular erythema indices (EIs) and dermatologists’ erythema perception, and we measured intra- and inter-rater variability of the currently-used manual image-sampling methods (ISMs). We showed that the EIBRG, based on all three RGB (red, green, and blue) channels, performed the best with an average Spearman coefficient of 0.75 and a typical absolute disagreement of less than 14% with the erythema assessed by clinicians. On the other hand, two image-sampling methods, based on either selecting specific pixels or small skin areas, performed similarly well. They achieved high intra- and inter-rater reliability with the intraclass correlation coefficient (ICC) and Krippendorff’s alpha well above 0.90. These results indicated that smartphone-based dermatoscopy could be a convenient and precise way to evaluate skin erythema severity. However, better outlined, or even automated ISMs, are likely to improve the intra- and inter-rater reliability in severe erythematous cases.

Download Full-text

Reliability and Validity of Three Instruments for Measuring Implant Stability Quotient

10.20944/preprints202007.0457.v1 ◽

2020 ◽

Author(s):

Monica Blazquez-Hinarejos ◽

Constanza Saka-Herrán ◽

Victor Diez-Alonso ◽

Jose Lopez-Lopez ◽

Raúl Ayuso‐Montero ◽

...

Keyword(s):

Correlation Coefficient ◽

Pearson Correlation ◽

Intraclass Correlation ◽

Reliability And Validity ◽

The Other ◽

Resonance Frequency Analysis ◽

Implant Stability ◽

Good Reliability ◽

Rater Reliability ◽

Implant Stability Quotient

Background: Actually, resonance frequency analysis (RFA) is the most extended method for measuring implant stability. The implant stability quotient (ISQ) is the measure obtained by the different RFA devices, however, inter- and intra- rater reliability and validity of some devices remains unknown. Methods: Thirty implants were placed in 3 different pig mandibles. ISQ was measured axial and parallel with Osstell® Beacon, Penguin® and MegaISQ® by 2 different operators and one operator performed a test-retest. Intraclass correlation coefficient was calculated to assess the intra- and inter-rater reliability. Pearson correlation coefficient was used to assess the validity. Results: The higher inter- and intra- rater reliability was obtained by Penguin® when measuring axial. The highest ISQ values were obtained using Penguin® in an axial measurement; the lowest, using the MegaISQ® in an axial measurement. The highest correlation values with the other devices were obtained by MegaISQ® measuring axially. Conclusion: Penguin® had a good reliability for measuring ISQ both inter- and intra- rater. Osstell® had good validity for measuring ISQ both axial and parallel and MegaISQ® had the best validity for measuring ISQ axial.

Download Full-text

Intra-rater reliability of transversus abdominis measurement by a novice examiner: Comparison of “freehand” to “probe force device” method of real-time ultrasound imaging

Ultrasound ◽

10.1177/1742271x19831720 ◽

2019 ◽

Vol 27 (3) ◽

pp. 156-166 ◽

Cited By ~ 1

Author(s):

Vanessa L Kennedy ◽

Carol A Flavell ◽

Kenji Doma

Keyword(s):

Measurement Error ◽

Real Time ◽

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Coefficient Of Variation ◽

Repeated Measures ◽

Intraclass Correlation ◽

Transversus Abdominis ◽

Rater Reliability ◽

Transverse Abdominis

A “free hand” real-time-ultrasound method is commonly applied to measure transversus abdominis. Potentially, this increases transversus abdominis measurement error due to uncontrolled variability in probe to skin force, inclination, and roll, particularly for novice examiners. This single-group repeated-measures reliability study compared the intra-rater reliability of transversus abdominis thickness and activation measurement by a novice examiner between free hand and a standardized probe force device method. The examiner captured ultrasound videos of transversus abdominis in a single session in healthy participants ( n = 33). Free hand ultrasound featured uncontrolled probe force, inclination, and roll, while probe force device method ultrasound standardized these parameters. Images of transversus abdominis at rest and contracted were measured and transversus abdominis activation calculated. Intraclass correlation coefficient, coefficient of variation, standard error of measurement, and worthwhile differences were calculated. The probe force device method resulted in greater reliability (intraclass correlation coefficient = 0.75–0.96) and lower measurement error (coefficient of variation = 8.89–28.7%) compared to free hand (intraclass correlation coefficient = 0.63–0.93; coefficient of variation = 6.52–29.4%). Reliability was good for all measurements except free hand TrA-C, which was moderate. TrA-C had the lowest reliability, followed by contracted thickness of the transverse abdominis, with resting thickness of the transverse abdominis being highest. Worthwhile differences were lower using a probe force device method versus free hand for resting thickness of the transverse abdominis and contracted thickness of the transverse abdominis and similar for TrA-C. Standardization using probe force device method ultrasound to measure transversus abdominis improved intra-rater reliability in a novice examiner. Use of a probe force device method is recommended to improve reliability through reduced sources of measurement error. Probe force device method intra- and inter-rater reliability in examiners of varying experience, in clinical populations, and to visualize other structures merits exploration.

Download Full-text

Validity and reliability of assessing diaphragmatic mobility by area on X-rays of healthy subjects

Jornal Brasileiro de Pneumologia ◽

10.1590/s1806-37562016000000131 ◽

2018 ◽

Vol 44 (3) ◽

pp. 220-226 ◽

Cited By ~ 1

Author(s):

Aline Pedrini ◽

Márcia Aparecida Gonçalves ◽

Bruna Estima Leal ◽

Michelle Gonçalves de Souza Tavares ◽

Wellington Pereira Yamaguti ◽

...

Keyword(s):

Correlation Coefficient ◽

Intraclass Correlation ◽

X Rays ◽

Altman Analysis ◽

Validity And Reliability ◽

Anthropometric Parameters ◽

Rater Reliability ◽

Bland Altman Analysis ◽

Left Hemidiaphragm ◽

The Right

ABSTRACT Objective: To investigate the concurrent validity, as well as the intra- and inter-rater reliability, of assessing diaphragmatic mobility by area (DMarea) on chest X-rays of healthy adults. Methods: We evaluated anthropometric parameters, pulmonary function, and diaphragmatic mobility in 43 participants. Two observers (rater A and rater B) determined diaphragmatic mobility at two time points. We used Pearson’s correlation coefficient to evaluate the correlation between DMarea and the assessment of diaphragmatic mobility by distance (DMdist). To evaluate intra- and inter-rater reliability, we used the intraclass correlation coefficient (ICC [2,1]), 95% CI, and Bland-Altman analysis. Results: A significant correlation was found between the DMarea and DMdist methods (r = 0.743; p < 0.0001). For DMarea, the intra-rater reliability was found to be quite high for the right hemidiaphragm (RHD)-ICC (2,1) = 0.92 (95% CI: 0.86-0.95) for rater A and ICC (2,1) = 0.90 (95% CI: 0.84-0.94) for rater B-and the left hemidiaphragm (LHD)-ICC (2,1) = 0.96 (95% CI: 0.93-0.97) for rater A and ICC (2,1) = 0.91 (95% CI: 0.81-0.95) for rater B-(p < 0.0001 for all). Also for DMarea, the inter-rater reliability was found to be quite high for the first and second evaluations of the RHD-ICC (2,1) = 0.99 (95% CI: 0.98-0.99) and ICC (2,1) = 0.95 (95% CI: 0.86-0.97), respectively-and the LHD-ICC (2,1) = 0.99 (95% CI: 0.98-0.99) and ICC (2,1) = 0.94 (95% CI: 0.87-0.97)-(p < 0.0001 for both). The Bland-Altman analysis showed good agreement between the mobility of the RHD and that of the LHD. Conclusions: The DMarea method proved to be a valid, reliable measure of diaphragmatic mobility.

Download Full-text

Learning how to Differ: Agreement and Reliability Statistics in Psychiatry

The Canadian Journal of Psychiatry ◽

10.1177/070674379504000202 ◽

1995 ◽

Vol 40 (2) ◽

pp. 60-66 ◽

Cited By ~ 46

Author(s):

L. Streiner David

Keyword(s):

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Intraclass Correlation ◽

Weighted Kappa ◽

The Other ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

The Subject

Whenever two or more raters evaluate a patient or student, it may be necessary to determine the degree to which they assign the same label or rating to the subject. The major problem in deciding which statistic to use is the plethora of different techniques which are available. This paper reviews some of the more commonly used techniques, such as Raw Agreement, Cohen's kappa and weighted kappa, and shows that, in most circumstances, they can all be replaced by the intraclass correlation coefficient (ICC). This paper also shows how the ICC can be used in situations where the other statistics cannot be used and how to select the best subset of raters.

Download Full-text

Validity and reliability of smartphone-based application for chronic ankle instability

International Journal of Therapy and Rehabilitation ◽

10.12968/ijtr.2021.0007 ◽

2021 ◽

Vol 28 (9) ◽

pp. 1-10

Author(s):

Taelim Yoon ◽

Jihyun Lee

Keyword(s):

Medical Device ◽

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Ankle Instability ◽

Intraclass Correlation ◽

Validity And Reliability ◽

Rater Reliability ◽

Eyes Closed ◽

Eyes Open ◽

Cumberland Ankle Instability Tool

Background/aims Ankle instability is one of the most common injuries that can occur during everyday life, sports and exercise. Recently, smartphone accelerometers have been used to measure single leg balance associated with ankle instability, because they are easy to use, inexpensive and can be used in small spaces. Thus, the purpose of this study was to introduce and investigate the intra- and inter-rater reliability of the smartphone accelerometer when assessing ankle instability. Methods A total of 26 individuals who had ankle instability were recruited. The single leg stance balance was measured using a smartphone accelerometer (Accelerometer application) and a force platform (I-Balance) for 5 seconds with their eyes open or their eyes closed. Results In the eyes open position, intra-rater reliability of the smartphone accelerometer was excellent for both raters (intraclass correlation coefficient: 0.87–0.90); and the inter-rater reliability was moderate (intraclass correlation coefficient: 0.71). In the eyes closed position, the intra-rater reliability of the smartphone accelerometer was excellent for both raters (intraclass correlation coefficient: 0.90–0.93); the inter-rater reliability was good (intraclass correlation coefficient: 0.82). Additionally, there were fair positive correlations between the smartphone accelerometer and the Cumberland Ankle Instability Tool, and between the smartphone accelerometer and I-Balance (r=0.33, 0.30 respectively). Conclusions The present study demonstrated excellent intra-rater reliabilities of two raters and moderate to good inter-rater reliabilities. The smartphone accelerometer offers several important advantages as a potential portable medical device to assess ankle instability accurately. Although there was a positive correlation, the relationships between the smartphone accelerometer and Cumberland Ankle Instability Tool and that between the smartphone accelerometer and I-Balance were fair. Future studies should investigate the validity of the smartphone accelerometer as a portable medical device for determining ankle instability.

Download Full-text

Validation of the pediatric Radboud dysarthria assessment

Journal of Pediatric Rehabilitation Medicine ◽

10.3233/prm-190671 ◽

2021 ◽

pp. 1-12

Author(s):

Marieke Ruessink ◽

Lenie van den Engel-Hoek ◽

Marjo van Gerven ◽

Bea Spek ◽

Bert de Swart ◽

...

Keyword(s):

Correlation Coefficient ◽

Intraclass Correlation ◽

Self Care ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Activity Level ◽

Clinical Use ◽

Spearman Correlation ◽

Rater Reliability ◽

Video Recordings

PURPOSE: The Radboud Dysarthria Assessment (RDA) was published in 2014. Adaptation into a pediatric version (p-RDA) was required because of relevant differences between children and adults. The purpose of this study was to assess the feasibility of the p-RDA and to test intra-rater and inter-rater reliability as well as the validity of the two severity scales (function and activity level). METHODS: Video recordings were made of 35 participants with (suspected) dysarthria (age 4 to 17 years) while being assessed using the p-RDA. Intra-rater reliability was assessed by one, and inter-rater reliability by two experiments using the Intraclass Correlation Coefficient (ICC). Validity of the severity scales was tested by correlating the consensus scores with the independently rated scores on four communication scales, three mobility scales, and one self-care scale using Spearman correlation coefficients (r s). RESULTS: The assessment was applicable for 89% of the tested sample, with good intra-rater and inter-rater reliability (ICC = 0.88–0.98 and 0.83–0.93). The p-RDA severity scales (function and activity level) correlated from substantially to strongly with the communication scales (r s = 0.69–0.82 and 0.77–0.92) and self-care scale (r s = 0.76–0.71) and correlated substantially with the mobility scales (r s = 0.49–0.60). CONCLUSION: The feasibility, reliability and validity of the p-RDA are sufficient for clinical use.

Download Full-text

Validity and Reliability of the Persian Version of Language Screening Test (LAST) for Patients in the Acute Phase of Stroke

Function and Disability Journal ◽

10.32598/fdj.3.13 ◽

2020 ◽

Vol 3 (1) ◽

pp. 91-100

Author(s):

Seyyede Zohreh Mousavi ◽

◽

Reyhaneh Jafari ◽

Saman Maroufizadeh3 ◽

Mohammad Moez Shahramnia ◽

...

Keyword(s):

Correlation Coefficient ◽

Acute Phase ◽

Screening Test ◽

Intraclass Correlation ◽

Chronic Phase ◽

Weighted Kappa ◽

Screening Tests ◽

Rater Reliability ◽

Persian Version ◽

Language Screening

Background & Objectives: Aphasia is one of the most common consequences of a stroke; thus, screening tests for early diagnosis of the problem are necessary when dealing with aphasia patients. One of these screening tests is the Language Screening Test (LAST). The purpose of this study was to translate, validate, and utilize this test in the Persian language for patients after stroke. Methods: The original version of LAST was translated into Persian, and then administrated on 100 patients in the acute phase by two examiners at the patient’s bedside in order to check the inter-rater reliability. To assess the agreement between the two forms (a and b) of the LAST, Concordance Correlation Coefficient (CCC), weighted Kappa, and Intraclass Correlation Coefficient (ICC) were used. Also, the Persian version of LAST and the Western Aphasia Battery (WAB) were performed at the chronic phase with two independent examiners with blind scoring. Results: Inter-rater reliability between Rater 1 and Rater 2 on LAST-a and LAST-b score were very good for both phases. The CCC for LAST-a and LAST-b, respectively, were 0.874 and 0.865 for the acute phase and 0.923 and 0.927 for the chronic phase. The weighted Kappa for LAST-a and LAST-b, respectively, were 0.750 and 0.740 for the acute phase, and 0.822 and 0.846 for the chronic phase. Conclusion: The obtained results showed that LAST is a very simple, fast, and valid test and can be used as a reliable tool in stroke patients. Lack of cultural and language dependency are the advantages of using this test.

Download Full-text

User testing of the psychometric properties of pictorial-based disability assessment Longshi Scale by healthcare professionals and non-professionals: a Chinese study in Shenzhen

Clinical Rehabilitation ◽

10.1177/0269215519846543 ◽

2019 ◽

Vol 33 (9) ◽

pp. 1479-1491 ◽

Cited By ~ 2

Author(s):

Yulong Wang ◽

Shanshan Guo ◽

Jiejiao Zheng ◽

Qing Mei Wang ◽

Yuling Zhang ◽

...

Keyword(s):

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Barthel Index ◽

Public Hospital ◽

Healthcare Professionals ◽

Intraclass Correlation ◽

Random Effect ◽

Second Phase ◽

Spearman Correlation ◽

Rater Reliability

Objective:The aim of this study was to validate a novel pictorial-based Longshi Scale for evaluating a patient’s disability by healthcare professionals and non-professionals.Design:Prospective study.Setting:Rehabilitation departments from a grade A, class 3 public hospital, a grade B, class 2 public hospital, and a private hospital and seven community rehabilitation centers.Subjects:A total of 618 patients and 251 patients with functional disabilities were recruited in a two-phase study, respectively.Main measures:Outcome measure: pictorial scale of activities of daily living (ADLs, Longshi Scale). Reference measure: Barthel Index. The Spearman correlation coefficient was used to analyze the validity of Longshi Scale against Barthel Index.Results:In phase 1 study, from March 2016 to August 2016, the results demonstrated that the Longshi Scale was both reliable and valid (intraclass correlation coefficient based on two-way random effect (ICC2,1) = 0.877–0.974 for intra-rater reliability; ICC2,1= 0.928–0.979; κ = 0.679–1.000 for inter-rater reliability; intraclass correlation coefficient based on one-way random effect (ICC1,1) = 0.921–0.984 for test–retest reliability and Spearman correlation coefficient = 0.836–0.899). In the second phase, in March 2018, results further demonstrated that the Longshi Scale had good inter-rater and intra-rater reliability among healthcare professionals and non-professionals including therapists, interns, and personal care aids (ICC1,1= 0.822–0.882 on Day 1; ICC1,1= 0.842–0.899 on Day 7 for inter-rater reliability). In addition, the Longshi Scale decreased assessment time significantly, compared with the Barthel Index assessment ( P < 0.01).Conclusion:The Longshi Scale could potentially provide an efficient way for healthcare professionals and non-professionals who may have minimal training to assess the ADLs of functionally disabled patients.

Download Full-text

Development and reliability of the Korean version of the Feeding Abilities Assessment

Hong Kong Journal of Occupational Therapy ◽

10.1177/1569186119850694 ◽

2019 ◽

Vol 32 (1) ◽

pp. 69-74

Author(s):

Seul Gi Koo ◽

Hae Yean Park ◽

Jongbae Kim ◽

Areum Han

Keyword(s):

Correlation Coefficient ◽

Internal Consistency ◽

Content Validity ◽

Assessment Tool ◽

High Reliability ◽

Intraclass Correlation ◽

Rater Reliability ◽

Retest Reliability ◽

Korean Version ◽

Test Retest Reliability

Objective The purpose of this study is to introduce a standardised assessment tool by verifying the reliability of the translated Korean version of the Feeding Abilities Assessment (K-FAA), which was developed to suit Korean culture. Methods The research subjects were 65 patients with dementia living in nursing homes. The K-FAA was completed by verifying the suitability of translation and reverse translation. The validity of the K-FAA was established through content validity, while its reliability was analysed based on internal consistency reliability for the items, test–retest reliability and inter-rater reliability. Results The content validity index determined, based on the assessment of professors, occupational therapists, and nurses, was more than .70. Cronbach’s α was more than .929, showing good internal consistency. A test–retest reliability of .884 was derived using Pearson’s correlation coefficient (p < .01), and an inter-rater reliability of .800 was derived using the kappa coefficients; intraclass correlation coefficient was .897, which also indicated good reliability. Conclusion The K-FAA was modified to fit the Korean domestic situation, and this assessment had high reliability. Therefore, K-FAA can evaluate the feeding ability of patients with dementia. Future studies should focus on providing evidence-based data to maintain or supplement the feeding ability of patients with dementia in Korea.

Download Full-text

Reliability and concurrent validity of the iPhone® Compass application to measure thoracic rotation range of motion (ROM) in healthy participants

PeerJ ◽

10.7717/peerj.4431 ◽

2018 ◽

Vol 6 ◽

pp. e4431 ◽

Cited By ~ 11

Author(s):

James Furness ◽

Ben Schram ◽

Alistair J. Cox ◽

Sarah L. Anderson ◽

Justin Keogh

Keyword(s):

Correlation Coefficient ◽

Thoracic Spine ◽

Concurrent Validity ◽

Measurement Techniques ◽

Intraclass Correlation ◽

Smart Phone ◽

The Body ◽

Rater Reliability ◽

Healthy Participants

Background Several water-based sports (swimming, surfing and stand up paddle boarding) require adequate thoracic mobility (specifically rotation) in order to perform the appropriate activity requirements. The measurement of thoracic spine rotation is problematic for clinicians due to a lack of convenient and reliable measurement techniques. More recently, smartphones have been used to quantify movement in various joints in the body; however, there appears to be a paucity of research using smartphones to assess thoracic spine movement. Therefore, the aim of this study is to determine the reliability (intra and inter rater) and validity of the iPhone® app (Compass) when assessing thoracic spine rotation ROM in healthy individuals. Methods A total of thirty participants were recruited for this study. Thoracic spine rotation ROM was measured using both the current clinical gold standard, a universal goniometer (UG) and the Smart Phone Compass app. Intra-rater and inter-rater reliability was determined with a Intraclass Correlation Coefficient (ICC) and associated 95% confidence intervals (CI). Validation of the Compass app in comparison to the UG was measured using Pearson’s correlation coefficient and levels of agreement were identified with Bland–Altman plots and 95% limits of agreement. Results Both the UG and Compass app measurements both had excellent reproducibility for intra-rater (ICC 0.94–0.98) and inter-rater reliability (ICC 0.72–0.89). However, the Compass app measurements had higher intra-rater reliability (ICC = 0.96 − 0.98; 95% CI [0.93–0.99]; vs. ICC = 0.94 − 0.98; 95% CI [0.88–0.99]) and inter-rater reliability (ICC = 0.87 − 0.89; 95% CI [0.74–0.95] vs. ICC = 0.72 − 0.82; 95% CI [0.21–0.94]). A strong and significant correlation was found between the UG and the Compass app, demonstrating good concurrent validity (r = 0.835, p < 0.001). Levels of agreement between the two devices were 24.8° (LoA –9.5°, +15.3°). The UG was found to consistently measure higher values than the compass app (mean difference 2.8°, P < 0.001). Conclusion This study reveals that the iPhone® app (Compass) is a reliable tool for measuring thoracic spine rotation which produces greater reproducibility of measurements both within and between raters than a UG. As a significant positive correlation exists between the Compass app and UG, this supports the use of either device in clinical practice as a reliable and valid tool to measure thoracic rotation. Considering the levels of agreement are clinically unacceptable, the devices should not be used interchangeably for initial and follow up measurements.

Download Full-text