Validity, Reliability, and Ability to Identify Fall Status of the Berg Balance Scale, BESTest, Mini-BESTest, and Brief-BESTest in Patients With COPD

Cristina Jácome; Joana Cruz; Ana Oliveira; Alda Marques

doi:10.2522/ptj.20150391

Validity, Reliability, and Ability to Identify Fall Status of the Berg Balance Scale, BESTest, Mini-BESTest, and Brief-BESTest in Patients With COPD

Physical Therapy ◽

10.2522/ptj.20150391 ◽

2016 ◽

Vol 96 (11) ◽

pp. 1807-1815 ◽

Cited By ~ 35

Author(s):

Cristina Jácome ◽

Joana Cruz ◽

Ana Oliveira ◽

Alda Marques

Keyword(s):

Interrater Reliability ◽

Intraclass Correlation ◽

Berg Balance Scale ◽

Performance Validity ◽

Operating Characteristics ◽

Intrarater Reliability ◽

Balance Test ◽

Balance Scale ◽

Balance Tests ◽

Abc Scale

Abstract Background The Berg Balance Scale (BBS), Balance Evaluation Systems Test (BESTest), Mini-BESTest, and Brief-BESTest are useful in the assessment of balance. Their psychometric properties, however, have not been tested in patients with chronic obstructive pulmonary disease (COPD). Objective This study aimed to compare the validity, reliability, and ability to identify fall status of the BBS, BESTest, Mini-BESTest, and the Brief-BESTest in patients with COPD. Design A cross-sectional study was conducted. Methods Forty-six patients (24 men, 22 women; mean age=75.9 years, SD=7.1) were included. Participants were asked to report their falls during the previous 12 months and to fill in the Activity-specific Balance Confidence (ABC) Scale. The BBS and the BESTest were administered. Mini-BESTest and Brief-BESTest scores were computed based on the participants' BESTest performance. Validity was assessed by correlating balance tests with each other and with the ABC Scale. Interrater reliability (2 raters), intrarater reliability (48–72 hours), and minimal detectable changes (MDCs) were established. Receiver operating characteristics assessed the ability of each balance test to differentiate between participants with and without a history of falls. Results Balance test scores were significantly correlated with each other (Spearman correlation rho=.73–.90) and with the ABC Scale (rho=.53–.75). Balance tests presented high interrater reliability (intraclass correlation coefficient [ICC]=.85–.97) and intrarater reliability (ICC=.52–.88) and acceptable MDCs (MDC=3.3–6.3 points). Although all balance tests were able to identify fall status (area under the curve=0.74–0.84), the BBS (sensitivity=73%, specificity=77%) and the Brief-BESTest (sensitivity=81%, specificity=73%) had the higher ability to identify fall status. Limitations Findings are generalizable mainly to older patients with moderate COPD. Conclusions The 4 balance tests are valid, reliable, and valuable in identifying fall status in patients with COPD. The Brief-BESTest presented slightly higher interrater reliability and ability to differentiate participants' fall status.

Download Full-text

Interrater Reliability of the Berg Balance Scale When Used by Clinicians of Various Experience Levels to Assess People With Lower Limb Amputations

Physical Therapy ◽

10.2522/ptj.20130182 ◽

2014 ◽

Vol 94 (3) ◽

pp. 371-378 ◽

Cited By ~ 21

Author(s):

Christopher K. Wong

Keyword(s):

Lower Limb ◽

Interrater Reliability ◽

Clinical Training ◽

Intraclass Correlation ◽

Berg Balance Scale ◽

Intrarater Reliability ◽

Rater Reliability ◽

Study Objective ◽

Balance Scale ◽

Scale Scores

Background People with lower limb amputations frequently have impaired balance ability. The Berg Balance Scale (BBS) has excellent psychometric properties for people with neurologic disorders and elderly people dwelling in the community. A Rasch analysis demonstrated the validity of the BBS for people with lower limb amputations of all ability strata, but rater reliability has not been tested. Objective The study objective was to determine the interrater reliability and intrarater reliability of BBS scores and the differences in scores assigned by testers with various levels of experience when assessing people with lower limb amputations. Design This reliability study of video-recorded single-session BBS assessments had a cross-sectional design. Methods From a larger study of people with lower limb amputations, 5 consecutively recruited participants using prostheses were video recorded during an in-person BBS assessment. Sixteen testers independently rated the video-recorded assessments. Testers were 3 physical therapists, 1 occupational therapist, 3 third-year and 4 second-year doctor of physical therapy (DPT) students, and 5 first-year DPT students without clinical training. Rater reliability was calculated using intraclass correlation coefficients (ICC [2,k]). Differences in scores assigned by testers with various levels of experience were determined by use of an analysis of variance with Tukey post hoc tests. Results The average age of the participants was 53.0 years (SD=15.7). Amputations had occurred at the ankle disarticulation, transtibial, and transfemoral levels because of vascular, trauma, and medical etiologies an average of 8.2 years earlier (SD=7.9). Berg Balance Scale scores spanned all ability strata. Interrater reliability (ICC [2,k]=.99) and intrarater reliability of scores determined in person and through video-recorded assessments by the same testers (ICC [2,k]=.99) were excellent. For participants with the lowest levels of ability, licensed professionals assigned lower scores than did DPT students without clinical training. Limitations Intrarater reliability calculations were based on 2 testers. Conclusions Berg Balance Scale scores assigned to people using prostheses by testers with various levels of clinical experience had excellent interrater reliability and intrarater reliability.

Download Full-text

Psychometric Properties of the Mini-Balance Evaluation Systems Test (Mini-BESTest) in Community-Dwelling Individuals With Chronic Stroke

Physical Therapy ◽

10.2522/ptj.20120454 ◽

2013 ◽

Vol 93 (8) ◽

pp. 1102-1115 ◽

Cited By ~ 104

Author(s):

Charlotte S.L. Tsang ◽

Lin-Rong Liao ◽

Raymond C.K. Chung ◽

Marco Y.C. Pang

Keyword(s):

Psychometric Properties ◽

Correlation Coefficient ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Positive Likelihood Ratio ◽

Berg Balance Scale ◽

Chronic Stroke ◽

Intrarater Reliability ◽

Evaluation Systems ◽

Stroke Group

BackgroundThe Mini-Balance Evaluation Systems Test (Mini-BESTest) is a new balance assessment, but its psychometric properties have not been specifically tested in individuals with stroke.ObjectivesThe purpose of this study was to examine the reliability and validity of the Mini-BESTest and its accuracy in categorizing people with stroke based on fall history.DesignAn observational measurement study with a test-retest design was conducted.MethodsOne hundred six people with chronic stroke were recruited. Intrarater reliability was evaluated by repeating the Mini-BESTest within 10 days by the same rater. The Mini-BESTest was administered by 2 independent raters to establish interrater reliability. Validity was assessed by correlating Mini-BESTest scores with scores of other balance measures (Berg Balance Scale, one-leg-standing, Functional Reach Test, and Timed “Up & Go” Test) in the stroke group and by comparing Mini-BESTest scores between the stroke group and 48 control participants, and between fallers (≥1 falls in the previous 12 months, n=25) and nonfallers (n=81) in the stroke group.ResultsThe Mini-BESTest had excellent internal consistency (Cronbach alpha=.89–.94), intrarater reliability (intraclass correlation coefficient [3,1]=.97), and interrater reliability (intraclass correlation coefficient [2,1]=.96). The minimal detectable change at 95% confidence interval was 3.0 points. The Mini-BESTest was strongly correlated with other balance measures. Significant differences in Mini-BESTest total scores were found between the stroke and control groups and between fallers and nonfallers in the stroke group. In terms of floor and ceiling effects, the Mini-BESTest was significantly less skewed than other balance measures, except for one-leg-standing on the nonparetic side. The Berg Balance Scale showed significantly better ability to identify fallers (positive likelihood ratio=2.6) than the Mini-BESTest (positive likelihood ratio=1.8).LimitationsThe results are generalizable only to people with mild to moderate chronic stroke.ConclusionsThe Mini-BESTest is a reliable and valid tool for evaluating balance in people with chronic stroke.

Download Full-text

Reliability Assessment of Scores From Video-Recorded TGMD-3 Performances

Journal of Motor Learning and Development ◽

10.1123/jmld.2016-0007 ◽

2017 ◽

Vol 5 (1) ◽

pp. 59-68 ◽

Cited By ~ 16

Author(s):

Pauli Olavi Rintala ◽

Arja Kaarina Sääkslahti ◽

Susanna Iivonen

Keyword(s):

Motor Development ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Intrarater Reliability ◽

Gross Motor ◽

Gross Motor Development ◽

Percent Agreement ◽

Two Samples ◽

Ball Skills

This study examined the intrarater and interrater reliability of the Test of Gross Motor Development—3rd Edition (TGMD-3). Participants were 60 Finnish children aged between 3 and 9 years, divided into three separate samples of 20. Two samples of 20 were used to examine the intrarater reliability of two different assessors, and the third sample of 20 was used to establish interrater reliability. Children’s TGMD-3 performances were video-recorded and later assessed using an intraclass correlation coefficient, a kappa statistic, and a percent agreement calculation. The intrarater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.69 to 0.77, and percent agreement ranged from 87 to 91%. The interrater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.56 to 0.64. Percent agreement of 83% was observed for locomotor skills, ball skills, and total skills, respectively. Hop, horizontal jump, and two-hand strike assessments showed the most difference between the assessors. These results show acceptable reliability for the TGMD-3 to analyze children’s gross motor skills.

Download Full-text

Influence of Rater Training on Inter- and Intrarater Reliability When Using the Rat Grimace Scale

Journal of the American Association for Laboratory Animal Science ◽

10.30802/aalas-jaalas-18-000044 ◽

2019 ◽

Vol 58 (2) ◽

pp. 178-183 ◽

Cited By ~ 8

Author(s):

Emily Q Zhang ◽

Vivian SY Leung ◽

Daniel SJ Pang

Keyword(s):

Acute Pain ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Training Group ◽

Intrarater Reliability ◽

Rater Training ◽

Trainee Group ◽

Pain Models ◽

Ongoing Pain ◽

And Performance

Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.

Download Full-text

Could Residents Adequately Assess the Severity of Hidradenitis Suppurativa? Interrater and Intrarater Reliability Assessment of Major Scoring Systems

Dermatology ◽

10.1159/000501771 ◽

2019 ◽

Vol 236 (1) ◽

pp. 8-14 ◽

Cited By ~ 1

Author(s):

Katarzyna Włodarek ◽

Aleksandra Stefaniak ◽

Łukasz Matusiak ◽

Jacek C. Szepietowski

Keyword(s):

Interrater Reliability ◽

Hidradenitis Suppurativa ◽

Intraclass Correlation ◽

Scoring Systems ◽

Staging System ◽

Severity Index ◽

Assessment Tools ◽

Intrarater Reliability ◽

Global Assessment Scale ◽

Interrater Variability

A wide variety of assessment tools have been proposed for hidradenitis suppurativa (HS) until now, but none of them meets the criteria for an ideal score. Because there is no gold standard scoring system, the choice of the measure instrument depends on the purpose of use and even on the physician’s experience in the subject of HS. The aim of this study was to assess the intrarater and interrater reliability of 6 scoring systems commonly used for grading severity of HS: the Hurley Staging System, the Refined Hurley Staging, the Hidradenitis Suppurativa Severity Score System (IHS4), the Hidradenitis Suppurativa Severity Index (HSSI), the Sartorius Hidradenitis Suppurativa Score and the Hidradenitis Suppurativa Physician’s Global Assessment Scale (HS-PGA). On the scoring day, 9 HS patients underwent a physical examination and disease severity assessment by a group of 16 dermatology residents using all evaluated instruments. Then, intrarater reliability was calculated using intraclass correlation coefficient (ICC), and interrater variability was evaluated using the coefficient of variation (CV). In all 6 scorings the ICCs were >0.75, indicating high intrarater reliability of all presented scales. The study has also demonstrated moderate agreement between raters in most of the evaluated measure instruments. The most reproducible methods, according to CVs, seem to be the Hurley staging, IHS4, and HSSI. None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods. The interrater reliability was usually good, but the most repeatable results between researchers were obtained for the easiest scales, including Hurley scoring, IHS4 and HSSI.

Download Full-text

Intrarater and Interrater Reliability of Infrared Image Analysis of Forearm Acupoints before and after Moxibustion

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2020/6328756 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Jiali Lou ◽

Yongliang Jiang ◽

Hantong Hu ◽

Xiaoyu Li ◽

Yajun Zhang ◽

...

Keyword(s):

Image Analysis ◽

Correlation Coefficient ◽

Temperature Change ◽

Intraclass Correlation Coefficient ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Infrared Image ◽

Infrared Images ◽

Intrarater Reliability ◽

Before And After

The objective of this study was to determine the intrarater and interrater reliabilities of infrared image analysis of forearm acupoints before and after moxibustion. In this work, infrared images of acupoints in the forearm of 20 volunteers (M/F, 10/10) were collected prior to and after moxibustion by infrared thermography (IRT). Two trained raters performed the analysis of infrared images in two different periods at a one-week interval. The intraclass correlation coefficient (ICC) was calculated to determine the intrarater and interrater reliabilities. With regard to the intrarater reliability, ICC values were between 0.758 and 0.994 (substantial to excellent). For the interrater reliability, ICC values ranged from 0.707 to 0.964 (moderate to excellent). Given that the intrarater and interrater reliability levels show excellent concordance, IRT could be a reliable tool to monitor the temperature change of forearm acupoints induced by moxibustion.

Download Full-text

Assessment of the Intrarater and Interrater Reliability of an Established Clinical Task Analysis Methodology

Anesthesiology ◽

10.1097/00000542-200205000-00016 ◽

2002 ◽

Vol 96 (5) ◽

pp. 1129-1139 ◽

Cited By ~ 46

Author(s):

Jason Slagle ◽

Matthew B. Weinger ◽

My-Than T. Dinh ◽

Vanessa V. Brumer ◽

Kevin Williams

Keyword(s):

Real Time ◽

Task Analysis ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Intrarater Reliability ◽

Intraclass Correlation Coefficients ◽

Percent Time ◽

Analysis Methodology ◽

And Task

Background Task analysis may be useful for assessing how anesthesiologists alter their behavior in response to different clinical situations. In this study, the authors examined the intraobserver and interobserver reliability of an established task analysis methodology. Methods During 20 routine anesthetic procedures, a trained observer sat in the operating room and categorized in real-time the anesthetist's activities into 38 task categories. Two weeks later, the same observer performed task analysis from videotapes obtained intraoperatively. A different observer performed task analysis from the videotapes on two separate occasions. Data were analyzed for percent of time spent on each task category, average task duration, and number of task occurrences. Rater reliability and agreement were assessed using intraclass correlation coefficients. Results Intrarater reliability was generally good for categorization of percent time on task and task occurrence (mean intraclass correlation coefficients of 0.84-0.97). There was a comparably high concordance between real-time and video analyses. Interrater reliability was generally good for percent time and task occurrence measurements. However, the interrater reliability of the task duration metric was unsatisfactory, primarily because of the technique used to capture multitasking. Conclusions A task analysis technique used in anesthesia research for several decades showed good intrarater reliability. Off-line analysis of videotapes is a viable alternative to real-time data collection. Acceptable interrater reliability requires the use of strict task definitions, sophisticated software, and rigorous observer training. New techniques must be developed to more accurately capture multitasking. Substantial effort is required to conduct task analyses that will have sufficient reliability for purposes of research or clinical evaluation.

Download Full-text

Assessing Balance Function in Patients With Total Knee Arthroplasty

Physical Therapy ◽

10.2522/ptj.20140486 ◽

2015 ◽

Vol 95 (10) ◽

pp. 1397-1407 ◽

Cited By ~ 21

Author(s):

Andy C.M. Chan ◽

Marco Y.C. Pang

Keyword(s):

Total Knee Arthroplasty ◽

Internal Consistency ◽

Knee Arthroplasty ◽

Convergent Validity ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Cronbach Alpha ◽

Ceiling Effects ◽

Total Knee ◽

Abc Scale

BackgroundThe Balance Evaluation Systems Test (BESTest) is a relatively new balance assessment tool. Recently, the Mini-BESTest and the Brief-BESTest, which are shortened versions of the BESTest, were developed.ObjectiveThe purpose of this study was to estimate interrater and intrarater-interoccasion reliability, internal consistency, concurrent and convergent validity, and floor and ceiling effects of the 3 BESTests and other related measures, namely, the Berg Balance Scale (BBS), Functional Gait Assessment (FGA), and Activities-specific Balance Confidence (ABC) Scale, among patients with total knee arthroplasty (TKA).DesignThis was an observational measurement study.MethodsTo establish interrater reliability, the 3 BESTests were administered by 3 independent raters to 25 participants with TKA. Intrarater-interoccasion reliability was evaluated in 46 participants with TKA (including the 25 individuals who participated in the interrater reliability experiments) by repeating the 3 BESTests, BBS, and FGA within 1 week by the same rater. Internal consistency of each test also was assessed with Cronbach alpha. Validity was assessed in another 46 patients with TKA by correlating the 3 BESTests with BBS, FGA, and ABC. The floor and ceiling effects also were examined.ResultsThe 3 BESTests demonstrated excellent interrater reliability (intraclass correlation coefficient [ICC] [2,1]=.96–.99), intrarater-interoccasion reliability (ICC [2,1]=.92–.96), and internal consistency (Cronbach alpha=.96–.98). These values were comparable to those for the BBS and FGA. The 3 BESTests also showed moderate-to-strong correlations with the BBS, FGA, and ABC (r=.35–.81), thus demonstrating good concurrent and convergent validity. No significant floor and ceiling effects were observed, except for the BBS.LimitationsThe results are generalizable only to patients with TKA due to end-stage knee osteoarthritis.ConclusionsThe 3 BESTests have good reliability and validity for evaluating balance in people with TKA. The Brief-BESTest is the least time-consuming and may be more useful clinically.

Download Full-text

Ghent Developmental Balance Test: A New Tool to Evaluate Balance Performance in Toddlers and Preschool Children

Physical Therapy ◽

10.2522/ptj.20110265 ◽

2012 ◽

Vol 92 (6) ◽

pp. 841-852 ◽

Cited By ~ 9

Author(s):

Alexandra De Kegel ◽

Tina Baetens ◽

Wim Peersman ◽

Leen Maes ◽

Ingeborg Dhooge ◽

...

Keyword(s):

Interrater Reliability ◽

Assessment Tool ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Control Group ◽

Clinical Group ◽

Balance Performance ◽

Balance Test ◽

Movement Assessment ◽

Z Scores

Background Balance is a fundamental component of movement. Early identification of balance problems is important to plan early intervention. The Ghent Developmental Balance Test (GDBT) is a new assessment tool designed to monitor balance from the initiation of independent walking to 5 years of age. Objective The purpose of this study was to establish the psychometric characteristics of the GDBT. Methods To evaluate test-retest reliability, 144 children were tested twice on the GDBT by the same examiner, and to evaluate interrater reliability, videotaped GDBT sessions of 22 children were rated by 3 different raters. To evaluate the known-group validity of GDBT scores, z scores on the GDBT were compared between a clinical group (n=20) and a matched control group (n=20). Concurrent validity of GDBT scores with the subscale standardized scores of the Movement Assessment Battery for Children–Second Edition (M-ABC-2), the Peabody Developmental Motor Scales–Second Edition (PDMS-2), and the balance subscale of the Bruininks-Oseretsky Test–Second Edition (BOT-2) was evaluated in a combined group of the 20 children from the clinical group and 74 children who were developing typically. Results Test-retest and interrater reliability were excellent for the GDBT total scores, with intraclass correlation coefficients of .99 and .98, standard error of measurement values of 0.21 and 0.78, and small minimal detectable differences of 0.58 and 2.08, respectively. The GDBT was able to distinguish between the clinical group and the control group (t38=5.456, P<.001). Pearson correlations between the z scores on GDBT and the standardized scores of specific balance subscales of the M-ABC-2, PDMS-2, and BOT-2 were moderate to high, whereas correlations with subscales measuring constructs other than balance were low. Conclusions The GDBT is a reliable and valid clinical assessment tool for the evaluation of balance in toddlers and preschool-aged children.

Download Full-text