scholarly journals Reporting Standards for a Bland–Altman Agreement Analysis: A Review of Methodological Reviews

Diagnostics ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 334 ◽  
Author(s):  
Oke Gerke

The Bland–Altman Limits of Agreement is a popular and widespread means of analyzing the agreement of two methods, instruments, or raters in quantitative outcomes. An agreement analysis could be reported as a stand-alone research article but it is more often conducted as a minor quality assurance project in a subgroup of patients, as a part of a larger diagnostic accuracy study, clinical trial, or epidemiological survey. Consequently, such an analysis is often limited to brief descriptions in the main report. Therefore, in several medical fields, it has been recommended to report specific items related to the Bland–Altman analysis. The present study aimed to identify the most comprehensive and appropriate list of items for such an analysis. Seven proposals were identified from a MEDLINE/PubMed search, three of which were derived by reviewing anesthesia journals. Broad consensus was seen for the a priori establishment of acceptability benchmarks, estimation of repeatability of measurements, description of the data structure, visual assessment of the normality and homogeneity assumption, and plotting and numerically reporting both bias and the Bland–Altman Limits of Agreement, including respective 95% confidence intervals. Abu-Arafeh et al. provided the most comprehensive and prudent list, identifying 13 key items for reporting (Br. J. Anaesth. 2016, 117, 569–575). An exemplification with interrater data from a local study accentuated the straightforwardness of transparent reporting of the Bland–Altman analysis. The 13 key items should be applied by researchers, journal editors, and reviewers in the future, to increase the quality of reporting Bland–Altman agreement analyses.

Author(s):  
Oke Gerke

The Bland–Altman Limits of Agreement is a popular and widespread means of analyzing the agreement of two methods, instruments, or raters in quantitative outcomes. An agreement analysis could be reported as a stand-alone research article but it is more often conducted as a minor quality assurance project in a subgroup of patients, as a part of a larger diagnostic accuracy study, clinical trial, or epidemiological survey. Consequently, such an analysis is often limited to brief descriptions in the main report. Therefore, in several medical fields, it has been recommended to report specific items related to the Bland–Altman analysis. Seven proposals were identified from a MEDLINE/PubMed search on March 03, 2020, three of which were derived by reviewing anesthesia journals. Broad consensus was seen for the a priori establishment of acceptability benchmarks, estimation of repeatability of measurements, description of the data structure, visual assessment of the normality and homogeneity assumption, and plotting and numerically reporting both bias and the Bland–Altman Limits of Agreement, including respective 95% confidence intervals. Abu-Arafeh et al. provided the most comprehensive and prudent list, identifying 13 key items for reporting (Br. J. Anaesth. 2016, 117, 569–575). The 13 key items should be applied by researchers, journal editors, and reviewers in the future, to increase the quality of reporting Bland–Altman agreement analyses.


2017 ◽  
Vol 30 (2) ◽  
pp. 233-237 ◽  
Author(s):  
Heidi E. Banse ◽  
Nichol Schultz ◽  
Molly McCue ◽  
Ray Geor ◽  
Dianne McFarlane

Accurate measurement of equine adrenocorticotropin (ACTH) is important for the diagnosis of equine pituitary pars intermedia dysfunction (PPID). Several radioimmunoassays (RIAs) and chemiluminescent immunoassays (CIAs) are used for measurement of ACTH concentration in horses; whether these methods yield similar results across a range of concentrations is not determined. We evaluated agreement between a commercial RIA and CIA. Archived plasma samples ( n = 633) were measured with both assays. Correlation between the 2 methods was moderate ( r = 0.49, p < 0.001). Bland–Altman analysis revealed poor agreement, with a proportional bias and widening limits of agreement with increasing values. Poor agreement between assays was also observed when evaluating plasma samples with concentrations at or below the recommended diagnostic cutoff value for PPID testing. The lack of agreement suggests that measurements obtained should not be considered interchangeable between methods.


Author(s):  
Mera Usman Muhammed ◽  
Mayaki Abubakar Musa ◽  
Gambo Abdulrahman Abdullahi

This study was carried out to compare the digital rectal (DR) thermometer with non-contact infrared thermometer (IRT) measurements at two locations on the face in some large animal species. Two hundred and forty (240) animals comprising of equal numbers of three species (cattle, camel and horses) of varying age and either sex was used. The IR temperature was taken from two sites [frontal (FIRT) and temporal (TIRT) region] on the animal face. The mean IR temperatures (FIRT and TIRT) were higher than the RT in all the animal species. The two thermometers correlate poorly in all the animal species. Bland-Altman analysis showed high biases and limits of agreement not acceptable for clinical purposes. In conclusion, IRT seems to offer a quick and easy way to determine the animal temperature but clinically it cannot be used interchangeably with DR thermometer at the moment for body temperature measurement in these animal species.


Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 1605-1605
Author(s):  
Fernanda Gutierrez-Rodrigues ◽  
Bárbara A Santana-Lemos ◽  
Priscila Santos Scheucher ◽  
Raquel M Alves-Paiva ◽  
Rodrigo T. Calado

Abstract Excessive telomere erosion is the molecular etiology of a group of disorders (dyskeratosis congenita, aplastic anemia, idiopathic pulmonary fibrosis) collectively called telomeropathies. Telomere length measurement is an essential diagnostic test for these diseases. The most commonly used methods are terminal restriction fragment (TRF) analysis by Southern blotting (the gold-standard method), flow cytometry combined with fluorescence in situ hybridization (flow-FISH), and quantitative PCR (qPCR). Although the clinical use of these methods has been reported, their utility and characteristics have not been widely compared. Measurement techniques and coefficients of variations often differ among diagnostic services. Here, we directly compared the accuracy, reproducibility, sensitivity, and specificity of flow-FISH and qPCR in comparison to TRF to measure peripheral blood leukocyte’s telomere length in healthy individuals and patients with telomeropathies. TRF analyses and flow-FISH showed good correlation in the analysis of samples from healthy subjects (R2=0.60; p<0.0001) and patients (R2=0.51; p<0.0001). Bland-Altman analyses also displayed a very good agreement between these methods for both healthy individuals (bias±SD = 0.17±1.03; limits of agreement ranging from 2.24 to -1.88) and patients (bias±SD = 0.0±1.21; limits of agreement ranging from 2.41 to -2.41). In contrast, the comparison between TRF and qPCR yielded modest correlation for the analysis of samples of healthy individuals (R2=0.35; p<0.0001) and low correlation for patients (R2=0.20; p=0.001). Bland-Altman analysis indicated poor agreement between the two methods for both patients and controls. The differences averages were very different from zero and standard deviation was wide. For patients, the bias±SD was 0.78±1.34 with limits of agreement ranging from 3.47 to -1.90, and for controls, the bias±SD was 1.15±1.49 with limits of agreement ranging from 4.14 to -1.84. Finally, qPCR and flow-FISH also modestly correlated in the analysis of healthy individual samples (R2=0.33; p<0.0001) and did not correlate in the comparison of patients’ samples (R2=0.1, p=0.08). Bland-Altman analysis corroborate this finding. For controls, the bias±SD were very similar to the one found by comparison between qPCR and TRF analysis (-0.6±1.27; limits of agreement ranging from 1.94 to -3.16). For patients, bias ± SD were -1.15 ± 1.65 with limits of agreement ranging from 2.15 to -4.45, which evidenced a poor agreement between flow-FISH and qPCR in these samples. Intra-assay coefficient of variation (CV) was 10.8±7.1% for flow-FISH and 9.5±7.4% for qPCR (p=0.35). The inter-assay CV was lower for flow-FISH (9.6±7.6%) in comparison to qPCR (16±19.5%; p=0.02). Flow-FISH and qPCR were sensitive (both 100%) and specific (93% and 89%, respectively) to distinguish very short telomeres. However, qPCR sensitivity (40%) and specificity (63%) to detect telomere length below tenth percentile were lower in comparison to flow-FISH (80% sensitivity and 85% specificity). Taken together, these findings indicate that, in the clinical setting, flow-FISH is more accurate and reproducible in the measurement of human leukocyte’s telomere length in comparison to qPCR. Quantitative PCR exhibited low accuracy in the analysis of samples of patients with short telomeres. In conclusion, flow-FISH appears to be a more appropriate method for diagnostic purposes. Studies that compare methodologies are helpful in the selection of standard methods and to narrow the differences among laboratories. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Author(s):  
Yushui Han ◽  
Ahmed Ibrahim Ahmed ◽  
Chris Schwemmer ◽  
Myra Cocker ◽  
Talal S Alnabelsi ◽  
...  

Abstract Background: Advances in computed tomography (CT) and machine learning have enabled on-site non-invasive assessment of fractional flow reserve (FFRCT). Purpose: To assess the inter-operator variability of Coronary CT Angiography–derived FFRCT using a machine learning based post-processing prototype.Materials and Methods: We included 60 symptomatic patients who underwent coronary CT angiography. FFRCT was calculated by 2 independent operators after training using a machine learning based on-site prototype. FFRCT was measured 1 cm distal to the coronary plaque or in the middle of the segments if no coronary lesions were present. Intraclass correlation coefficient (ICC) and Bland-Altman analysis were used to evaluate inter-operator variability effect in FFRCT estimates. Sensitivity analysis was done by cardiac risk factors, degree of stenosis and image quality. Results: A total of 535 coronary segments in 60 patients were assessed. The overall ICC was 0.986 per patient (95% CI: 0.977 - 0.992) and 0.972 per segment (95% CI: 0.967 - 0.977). The absolute mean difference in FFRCT estimates was 0.012 per patient (95% CI for limits of agreement: -0.035 - 0.039) and 0.02 per segment (95% CI for limits of agreement: -0.077 - 0.080). Tight limits of agreement were seen on Bland-Altman analysis. Distal segments had greater variability compared to proximal/mid segments (absolute mean difference 0.011 vs 0.025, p<0.001). Results were similar on sensitivity analysis. Conclusion: A high degree of inter-operator reproducibility can be achieved by onsite machine learning based FFRCT assessment. Future research is required to evaluate the physiological relevance and prognostic value of FFRCT.


2012 ◽  
Vol 109 (3) ◽  
pp. 539-546 ◽  
Author(s):  
Michelle C. Carter ◽  
V. J. Burley ◽  
C. Nykjaer ◽  
J. E. Cade

Accurate dietary assessment is an essential foundation of research in nutritional epidemiology. Due to the weaknesses in current methodology, attention is turning to strategies that automate the dietary assessment process to improve accuracy and reduce the costs and burden to participants and researchers. ‘My Meal Mate’ (MMM) is a smartphone application designed to support weight loss. The present study aimed to validate the diet measures recorded on MMM against a reference measure of 24 h dietary recalls. A sample of fifty volunteers recorded their food and drink intake on MMM for 7 d. During this period, they were contacted twice at random to conduct 24 h telephone recalls. Daily totals for energy (kJ) and macronutrients recorded on MMM were compared against the corresponding day of recall using t tests for group means and Pearson's correlations. Bland–Altman analysis was used to assess the agreement between the methods. Energy (kJ) recorded on MMM correlated well with the recalls (day 1: r 0·77 (95 % CI 0·62, 0·86), day 2: r 0·85 (95 % CI 0·74, 0·91)) and had a small mean difference (day 1 (MMM −  recall): − 68 kJ/d (95 % CI − 553, 418 kJ) ( − 16 kcal/d, 95 % CI − 127, 100 kcal); day 2 (MMM −  recall): − 441 kJ/d (95 % CI − 854, − 29 kJ) ( − 105 kcal/d, 95 % CI − 204, − 7 kcal)). Bland–Altman analysis showed wide limits of agreement between the methods: − 3378 to 3243 kJ/d ( − 807 to 775 kcal/d) on day 1. At the individual level, the limits of agreement between MMM and the 24 h recall were wide; however, at the group level, MMM appears to have potential as a dietary assessment tool.


2015 ◽  
Vol 18 (01) ◽  
pp. 1550003
Author(s):  
Travis M. Falconer ◽  
Julie Headford ◽  
Stephen Edmondston ◽  
Piers J. Yates

The Oxford Hip Score (OHS) and Oxford Knee Score (OKS) are validated, reliable and reproducible outcome measures, however their use retrospectively has not been examined. The aim of this prospective cohort study was to examine the accuracy and reliability of patients' ability to recall their OHS and OKS in a retrospective manner. A total of 137 patients undergoing primary hip (40) or primary knee (97) arthroplasty with a mean age of 70.8 years (range, 47–88) and a mean time to follow up of 27.2 months (range, 6–46) were included in the study. The mean retrospective OHS and OKS decreased compared to the pre-operative score (OHS = 1.6 ± SD, p = 0.36, OKS = 4.7 ± SD, p < 0.001). There was only a weak positive relationship between the actual pre-operative scores and the retrospective scores (OHS: r2 = 0.30, OKS: r2 = 0.19). Bland–Altman analysis demonstrated 95% limits of agreement between scores of -19.9 to 23.1 for the OHS and -15.3 to 24.8 for the OKS. This study shows that patients are poor at retrospectively recalling their pre-operative OHS and OKS and therefore these scores should not be used in a retrospective manner.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Valentin Picone ◽  
Nikolaos Makris ◽  
Fanny Boutevin ◽  
Sarah Roy ◽  
Margot Playe ◽  
...  

Abstract Background The SwiftScan solution (General Electric Healthcare) combines a new low-energy high-resolution sensitivity collimator and a tomographic step-and-shoot continuous (SSC) mode acquisition. The purpose of this study is to determine whether SSC mode can be used in clinical practice with shorter examination times, while preserving image quality and ensuring accurate semi-quantification. Twenty bone scan and 10 lung scan studies were randomly selected over a period of 2 months. Three sets of image datasets were produced: step-and-shoot (SS) acquisition, simulated 25% count reduction using the Poisson resampling method (SimSS), and SimSS continuous acquisition (SimSSC), where SimSS was summed with counts acquired during detector head rotation. Visual assessment (5-point Likert scale, 2 readers) and semi-quantitative evaluation (50 focal uptake from 10 bone studies), assessed by SUVmean, coefficient of variation (COV), and contrast-to-noise ratio (CNR), were performed using t test and Bland-Altman analysis. Results Intra-reader agreement was substantial for reader 1 (k = 0.71) and for reader 2 (k = 0.61). Inter-reader agreement was substantial for SS set (k = 0.93) and moderate for SimSSC (k = 0.52). Bland-Altman analysis showed a good interchangeability of SS and SimSSC SUV values. The mean CNR between SS and SimSSC was not significantly different: 42.9 ± 43.7 [23.7–62.1] vs. 43.1 ± 46 [22.9–63.3] (p = 0.46), respectively. COV values, assessing noise level, did not deviate significantly between SS and SimSSC: 0.20 ± 0.08 [0.18–0.23] vs. 0.21 ± 0.08, [0.18–0.23] (p = 0.15), respectively, whereas a significant difference was demonstrated between SS and SimSS: 0.20 ± 0.08 [0.18–0.23] vs. 0.23 ± 0.09 [0.20–0.25] (p < 0.0001), respectively. Conclusions SSC mode acquisition decreases examination time by approximately 25% in bone and lung SPECT/CT studies compared to SS mode (~ 2 min per single-bed SPECT), without compromising image quality and signal quantification. This SPECT sensitivity improvement also offers the prospect of more comfortable exams, with less motion artifacts, especially in painful or dyspneic patients.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e4132 ◽  
Author(s):  
Rashad Zayat ◽  
Andreas Goetzenich ◽  
Ju-Yeon Lee ◽  
HeeJung Kang ◽  
So-Hyun Jansen-Park ◽  
...  

BackgroundBedside non-invasive techniques, such as radial artery tonometry, to estimate hemodynamic parameters have gained increased relevance as an attractive alternative and efficient method to measure hemodynamics in outpatient departments. For our pilot study, we sought to compare cardiac output (CO), and stroke volume (SV) estimated from a radial artery tonometry blood pressure pulse analyzer (BPPA) (DMP-Life, DAEYOMEDI Co., Gyeonggi-do, South Korea) to pulsed-wave Doppler (PWD) echocardiography derived parameters.MethodsFrom January 2015 to December 2016, all patients scheduled for coronary artery bypass (CABG) surgery at our department were screened. Exclusion criteria were, inter alia, moderate to severe aortic- or Mitral valve disease and peripheral arterial disease (PAD) > stage II. One hundred and seven patients were included (mean age 66.1 ± 9.9, 15 females, mean BMI 27.2 ± 4.1 kg/m2). All patients had pre-operative transthoracic echocardiography (TTE). We measured the hemodynamic parameters with the BPPA from the radial artery, randomly before or after TTE. For the comparison between the measurement methods we used the Bland-Altman test and Pearson correlation.ResultsMean TTE-CO was 5.1 ± 0.96 L/min, and the mean BPPA-CO was 5.2 ± 0.85 L/min. The Bland-Altman analysis for CO revealed a bias of −0.13 L/min and SD of 0.90 L/min with upper and lower limits of agreement of −1.91 and +1.64 L/min. The correlation of CO measurements between DMP-life and TTE was poor (r = 0.501,p < 0.0001). The mean TTE-SV was 71.3 ± 16.2 mL and the mean BPPA-SV was 73.8 ± 19.2 mL. SV measurements correlated very well between the two methods (r = 0.900,p < 0.0001). The Bland-Altman analysis for SV revealed a bias of −2.54 mL and SD of ±8.42 mL and upper and lower limits of agreement of −19.05 and +13.96 mL, respectively.ConclusionOur study shows for the first time that the DMP-life tonometry device measures SV and CO with reasonable accuracy and precision of agreement compared with TTE in preoperative cardiothoracic surgery patients. Tonometry BPPA are relatively quick and simple measuring devices, which facilitate the collection of cardiac and hemodynamic information. Further studies with a larger number of patients and with repeated measurements are in progress to test the reliability and repeatability of DMP-Life system.


2019 ◽  
Author(s):  
Abdourahamane Yacouba ◽  
Malika Congo ◽  
Gérard Komonsira Dioma ◽  
Hermann Somlaré ◽  
David Coulidiaty ◽  
...  

AbstractBackgroundSeveral studies have been conducted to compare the use DBS as alternative to plasma specimens, but mainly using Whatman 903® cards as filter paper. The aim of this study was to evaluate Whatman FTA® cards (FTA cards) specimens for HIV-1 viral load testing by comparing it to plasma specimens, using 2 real-Time PCR assays.MethodologyA cross-sectional study was conducted between April 2017 and September 2017, in HIV-1 patients admitted at Yalgado Ouédraogo teaching hospital. Paired FTA cards and plasma specimens were collected and analyzed using Abbott RealTime HIV-1 assay (Abbott) and COBAS® AmpliPrep/COBAS® TaqMan v2.0 (Roche), following manufacturers’ protocol.ResultsA total of 107 patients were included. No Statistical differences (p-value > 0.05) were observed between the mean viral loads obtained from FTA cards and plasma specimens with Roche and Abbott assays. Twenty-nine samples with Roche and 15 samples with Abbott assay showed discrepant results. At viral loads of ≤1000 copies/mL, the sensitivity and specificity of FTA cards were 78.6%, and 100% with Roche, and 92.3% and 95.9% with Abbott. Strong correlation was found between FTA cards and plasma specimens with both assays. With Roche, Bland-Altman analysis showed bias of −0.3 and 95% limits of agreement of −2.6 to 1.8 log10, with 97/99 cases (97.9%) within agreement limits. With Abbott, Bland-Altman analysis showed bias of −0.1 and 95% limits of agreement of −2.3 to 2.1 log10, with 96/99 cases (96.9%) within agreement limits.ConclusionOur study demonstrated the feasibility of using FTA cards filter paper for HIV-1 viral load testing. However, further studies are required for FTA cards filter paper validation in HIV-1 treatment monitoring.


Sign in / Sign up

Export Citation Format

Share Document