An Argument for Self-Report as a Reference Standard in Audiology

2018 ◽  
Vol 29 (03) ◽  
pp. 206-222 ◽  
Author(s):  
Andrew J. Vermiglio ◽  
Sigfrid D. Soli ◽  
Xiangming Fang

AbstractThe primary components of a diagnostic accuracy study are an index test, the target condition (or disorder), and a reference standard. According to the Standards for Reporting Diagnostic Accuracy statement, the reference standard should be the best method available to independently determine if the results of an index test are correct. Pure-tone thresholds have been used as the “gold standard” for the validation of some tests used in audiology. Many studies, however, have shown a lack of agreement between the audiogram and the patient’s perception of hearing ability. For example, patients with normal audiograms may report difficulty understanding speech in the presence of background noise.The primary purpose of this article is to present an argument for the use of self-report as a reference standard for diagnostic studies in the field of audiology. This will be in the form of a literature review on pure-tone threshold measures and self-report as reference standards. The secondary purpose is to determine the diagnostic accuracy of pure-tone threshold and Hearing-in-Noise Test (HINT) measures for the detection of a speech-recognition-in-noise disorder.Two groups of participants with normal pure-tone thresholds were evaluated. The King–Kopetzky syndrome (KKS) group was made up of participants with the self-report of speech-recognition-in-noise difficulties. The control group was made up of participants with no reports of speech-recognition-in-noise problems. The reference standard was self-report. Diagnostic accuracy of HINT and pure-tone threshold measures was determined by measuring group differences, sensitivity and specificity, and the area under the curve (AUC) for receiver-operating characteristic (ROC) curves.Forty-seven participants were tested. All participants were native speakers of American English. Twenty-two participants were in the control group and 25 in the KKS group. The groups were matched for age.Pure-tone threshold data were collected using the Hughson–Westlake procedure. Speech-recognition-in-noise data was collected using a software system and the standard HINT protocol. Statistical analyses were conducted using descriptive, correlational, two-sample t tests, and logistic regression.The literature review revealed that self-report has been used as a reference standard in investigations of patients with normal audiograms and the perception of difficulty understanding speech in the presence of background noise. Self-report may be a better indicator of hearing ability than pure-tone thresholds in some situations. The diagnostic accuracy investigation revealed statistically significant differences between control and KKS groups for HINT performance (p < 0.01), but not for pure-tone threshold measures. Better sensitivity was found for the HINT Composite score (88%) than pure-tone average (PTA; 28%). The specificities for the HINT Composite score and PTA were 77% and 95%, respectively. ROC curves revealed a greater AUC for the HINT Composite score (AUC = 0.87) than for PTA (AUC = 0.51).Self-report is a reasonable reference standard for studies on the diagnostic accuracy of speech-recognition-in-noise tests. For individuals with normal pure-tone thresholds, the HINT demonstrated a higher degree of diagnostic accuracy than pure-tone thresholds for the detection of speech-recognition-in-noise disorder.

Author(s):  
Andrew J. Vermiglio ◽  
Lauren Leclerc ◽  
Meagan Thornton ◽  
Hannah Osborne ◽  
Elizabeth Bonilla ◽  
...  

Purpose The goal of this study was to determine the ability of the AzBio speech recognition in noise (SRN) test to distinguish between groups of participants with and without a self-reported SRN disorder and a self-reported signal-to-noise ratio (SNR) loss. Method Fifty-four native English-speaking young adults with normal pure-tone thresholds (≤ 25 dB HL, 0.25–6.0 kHz) participated. Individuals who reported hearing difficulty in a noisy restaurant (Reference Standard 1) were placed in the SRN disorder group. SNR loss groups were created based on the self-report of the ability to hear Hearing in Noise Test (HINT) sentences in steady-state speech-shaped noise, four-talker babble, and 20-talker babble in a controlled listening environment (Reference Standard 2). Participants with HINT thresholds poorer than or equal to the median were assigned to the SNR loss group. Results The area under the curve from the receiver operating characteristics curves revealed that the AzBio test was not a significant predictor of an SRN disorder, or an SNR loss using the steady-state noise Reference Standard 2 condition. However, the AzBio was a significant predictor of an SNR loss using the four-talker babble and 20-talker babble Reference Standard 2 conditions ( p < .05). The AzBio was a significant predictor of an SNR loss when using the average HINT thresholds across the three Reference Standard 2 masker conditions (area under the curve = .79, p = .001). Conclusions The AzBio test was not a significant predictor of a self-reported SRN disorder or a self-reported SNR loss in steady-state noise. However, it was a significant predictor of a self-reported SNR loss in babble noise and the average across all noise conditions. A battery of reference standard tests with a range of maskers in a controlled listening environment is recommended for diagnostic accuracy evaluations of SRN tests.


2012 ◽  
Vol 23 (10) ◽  
pp. 779-788 ◽  
Author(s):  
Andrew J. Vermiglio ◽  
Sigfrid D. Soli ◽  
Daniel J. Freed ◽  
Laurel M. Fisher

Background: Speech recognition in noise testing has been conducted at least since the 1940s (Dickson et al, 1946). The ability to recognize speech in noise is a distinct function of the auditory system (Plomp, 1978). According to Kochkin (2002), difficulty recognizing speech in noise is the primary complaint of hearing aid users. However, speech recognition in noise testing has not found widespread use in the field of audiology (Mueller, 2003; Strom, 2003; Tannenbaum and Rosenfeld, 1996). The audiogram has been used as the “gold standard” for hearing ability. However, the audiogram is a poor indicator of speech recognition in noise ability. Purpose: This study investigates the relationship between pure-tone thresholds, the articulation index, and the ability to recognize speech in quiet and in noise. Research Design: Pure-tone thresholds were measured for audiometric frequencies 250–6000 Hz. Pure-tone threshold groups were created. These included a normal threshold group and slight, mild, severe, and profound high-frequency pure-tone threshold groups. Speech recognition thresholds in quiet and in noise were obtained using the Hearing in Noise Test (HINT) (Nilsson et al, 1994; Vermiglio, 2008). The articulation index was determined by using Pavlovic's method with pure-tone thresholds (Pavlovic, 1989, 1991). Study Sample: Two hundred seventy-eight participants were tested. All participants were native speakers of American English. Sixty-three of the original participants were removed in order to create groups of participants with normal low-frequency pure-tone thresholds and relatively symmetrical high-frequency pure-tone threshold groups. The final set of 215 participants had a mean age of 33 yr with a range of 17–59 yr. Data Collection and Analysis: Pure-tone threshold data were collected using the Hughson-Weslake procedure. Speech recognition data were collected using a Windows-based HINT software system. Statistical analyses were conducted using descriptive, correlational, and multivariate analysis of covariance (MANCOVA) statistics. Results: The MANCOVA analysis (where the effect of age was statistically removed) indicated that there were no significant differences in HINT performances between groups of participants with normal audiograms and those groups with slight, mild, moderate, or severe high-frequency hearing losses. With all of the data combined across groups, correlational analyses revealed significant correlations between pure-tone averages and speech recognition in quiet performance. Nonsignificant or significant but weak correlations were found between pure-tone averages and HINT thresholds. Conclusions: The ability to recognize speech in steady-state noise cannot be predicted from the audiogram. A new classification scheme of hearing impairment based on the audiogram and the speech reception in noise thresholds, as measured with the HINT, may be useful for the characterization of the hearing ability in the global sense. This classification scheme is consistent with Plomp's two aspects of hearing ability (Plomp, 1978).


2020 ◽  
Vol 31 (03) ◽  
pp. 224-232
Author(s):  
Andrew J. Vermiglio ◽  
Sigfrid D. Soli ◽  
Daniel J. Freed ◽  
Xiangming Fang

AbstractThe literature presents conflicting reports on the relationship between pure-tone threshold average and speech recognition in noise ability.The purpose of this retrospective study and meta-analysis was to determine the effect of stimulus audibility on the relationship between speech recognition in noise ability and bilateral pure-tone average (BPTA).Pure-tone threshold and Hearing in Noise Test (HINT) data from two data sets were evaluated. The HINT data from both data sets were divided into groups with complete and partial audibility of the HINT stimuli delivered at 65 dBA.Normal and hearing-impaired participants were included in this retrospective study. For data set 1 (n = 215), a relatively weak relationship had been found between HINT thresholds and BPTA. For data set 2 (n = 55), a relatively strong relationship had been found between HINT thresholds and BPTA. For data set 1, only 10% of the participants had partial audibility of the HINT stimuli. For data set 2, 16% of the participants had partial audibility of the HINT stimuli.Pure-tone thresholds and HINT data were obtained from published and unpublished studies. HINT data were collected in a simulated soundfield environment under headphones using the standard HINT protocol. Statistical analyses included descriptive statistics, correlations, and a two-way analysis of variance (ANOVA), and multiple regression.A two-way ANOVA followed by post hoc analyses revealed a greater difference between the data sets for the Noise Front thresholds obtained with partial rather than complete audibility of the stimuli. A weak and nonsignificant relationship was found between BPTA(0.5, 1.0, 2.0, 3.0, 6.0 kHz) versus HINT Noise Front thresholds for complete audibility data (r = 0.060, p = 0.356) and a strong relationship was found for the partial audibility data (r = 0.863, p < 0.001).The proportion of partial audibility data in a given data set may influence the relative strength of the relationship between BPTA and HINT Noise Front thresholds. This brings into question the convention of using pure-tone average as a predictor of speech recognition in noise ability.


Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 4473-4473
Author(s):  
Floor CJI Moenen ◽  
Yvonne MC Henskens ◽  
Saskia AM Schols ◽  
Patty J Nelemans ◽  
Harry C. Schouten ◽  
...  

Abstract Introduction A study of diagnostic test accuracy compares a single index test to a gold standard to determine status of disease. The observed accuracy of a test varies among patient subgroups and is sensitive to bias. To achieve reliable estimates of diagnostic accuracy, an appropriate study design in a clinically relevant population is warranted. Recently a review was published about the evolution of the bleeding assessment tool (BAT) in diagnosing patients with mild bleeding disorders (MBD) (Rydz et al. J Thromb Haemost 2012). Many validation studies have been done. However, a critical appraisal addressing the quality of these validation studies is lacking. Objective We performed a systematic review to determine the quality and applicability of studies assessing the diagnostic utility of the BAT for MBD among clinic based cohorts. Methods The literature search was conducted using the electronic database PubMed. The final search date was March 2, 2015. The search terms: 'bleeding disorder OR bleeding tendency' AND 'bleeding questionnaire' were used. All studies assessing the diagnostic accuracy of bleeding questionnaires in identifying adults (age > 18 years) with MBD were considered eligible, irrespective of study design or used reference standard. The methodological quality and applicability of each included study was assessed using a Quality Assessment of Diagnostic studies-2 (QUADAS-2) tool. This tool consists of four domains specific for patient selection, index test, reference standard and participant flow. For each domain bias was assessed using signaling questions, for the first three domains applicability was assessed. Results The search yielded 530 citations, from which 35 possible relevant full-text studies were identified. Twenty-two studies were excluded, reasons for exclusion were: letter to the editor, validation of questionnaire combined with laboratory results and primary care population. Table 1 shows the 13 included studies, the assessed BAT and the targeted bleeding condition. Risk of bias and applicability concerns are summarized in figure 1. In 77% of the studies there was a high risk of bias for patient selection and applicability concerns. Many studies used a case control design, comparing patients with a known bleeding disorder with healthy controls. This leads to spectrum bias and might generate higher estimates of sensitivity and specificity (Rutjes et al. Clin Chem 2005). In 46% there was a high risk of bias for index test due to the use of a self-administered questionnaire or because the person conducting the questionnaire was aware of the diagnosis. This leads to observer bias caused by better awareness and over-reporting of bleeding symptoms. Finally, there was high risk of bias in study flow in 38% of the studies. These studies included symptoms after diagnosis of the bleeding disorder. Since bleeding disorders are managed by interventions to prevent bleeding, underestimation of the bleeding symptoms may occur. Conclusion This review highlights the difficulties and advantages of the BAT validation studies. It provides the ability for medical practitioners to apply the BAT with full awareness of its restrictions and benefits. With the evaluation of the risks of bias in the included studies we highlighted limitations, especially in method of patient selection and use of index test, that future studies preferably should try to avoid. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Chinyereugo M. Umemneku Chikere ◽  
Kevin J. Wilson ◽  
A. Joy Allen ◽  
Luke Vale

Abstract Background Staquet et al. and Brenner both developed correction methods to estimate the sensitivity and specificity of a binary-response index test when the reference standard is imperfect and its sensitivity and specificity are known. However, to our knowledge, no study has compared the statistical properties of these methods, despite their long application in diagnostic accuracy studies. Aim To compare the correction methods developed by Staquet et al. and Brenner. Methods Simulations techniques were employed to compare the methods under assumptions that the new test and the reference standard are conditionally independent or dependent given the true disease status of an individual. Three clinical datasets were analysed to understand the impact of using each method to inform clinical decision-making. Results Under the assumption of conditional independence, the Staquet et al. correction method outperforms the Brenner correction method irrespective of the prevalence of disease and whether the performance of the reference standard is better or worse than the index test. However, when the prevalence of the disease is high (> 0.9) or low (< 0.1), the Staquet et al. correction method can produce illogical results (i.e. results outside [0,1]). Under the assumption of conditional dependence; both methods failed to estimate the sensitivity and specificity of the index test especially when the covariance terms between the index test and the reference standard is not close to zero. Conclusion When the new test and the imperfect reference standard are conditionally independent, and the sensitivity and specificity of the imperfect reference standard are known, the Staquet et al. correction method outperforms the Brenner method. However, where the prevalence of the target condition is very high or low or the two tests are conditionally dependent, other statistical methods such as latent class approaches should be considered.


2008 ◽  
Vol 19 (07) ◽  
pp. 548-556 ◽  
Author(s):  
Richard H. Wilson ◽  
Wendy B. Cates

Background: The Speech Recognition in Noise Test (SPRINT) is a word-recognition instrument that presents the 200 Northwestern University Auditory Test No. 6 (NU-6) words binaurally at 50 dB HL in a multitalker babble at a 9 dB signal-to-noise ratio (S/N) (Cord et al, 1992). The SPRINT was developed by and used by the Army as a more valid predictor of communication abilities (than pure-tone thresholds or word-recognition in quiet) for issues involving fitness for duty from a hearing perspective of Army personnel. The Words-in-Noise test (WIN) is a slightly different word-recognition task in a fixed level multitalker babble with 10 NU-6 words presented at each of 7 S/N from 24 to 0 dB S/N in 4 dB decrements (Wilson, 2003; Wilson and McArdle, 2007). For the two instruments, both the babble and the speakers of the words are different. The SPRINT uses all 200 NU-6 words, whereas the WIN uses a maximum of 70 words. Purpose: The purpose was to compare recognition performances by 24 young listeners with normal hearing and 48 older listeners with sensorineural hearing on the SPRINT and WIN protocols. Research Design: A quasi-experimental, mixed model design was used. Study Sample: The 24 young listeners with normal hearing (19 to 29 years, mean = 23.3 years) were from the local university and had normal hearing (≤20 dB HL; American National Standards Institute, 2004) at the 250–8000 Hz octave intervals. The 48 older listeners with sensorineural hearing loss (60 to 82 years, mean = 69.9 years) had the following inclusion criteria: (1) a threshold at 500 Hz between 15 and 30 dB HL, (2) a threshold at 1000 Hz between 20 and 40 dB HL, (3) a three-frequency pure-tone average (500, 1000, and 2000 Hz) of ≤40 dB HL, (4) word-recognition scores in quiet ≥40%, and (5) no history of middle ear or retrocochlear pathology as determined by an audiologic evaluation. Data Collection and Analysis: The speech materials were presented bilaterally in the following order: (1) the SPRINT at 50 dB HL, (2) two half lists of NU-6 words in quiet at 60 dB HL and 80 dB HL, and (3) the two 35-word lists of the WIN materials with the multitalker babble fixed at 60 dB HL. Data collection occurred during a 40–60 minute session. Recognition performances on each stimulus word were analyzed. Results: The listeners with normal hearing obtained 92.5% correct on the SPRINT with a 50% point on the WIN of 2.7 dB S/N. The listeners with hearing loss obtained 65.3% correct on the SPRINT and a WIN 50% point at 12.0 dB S/N. The SPRINT and WIN were significantly correlated (r = −0.81, p < .01), indicating that the SPRINT had good concurrent validity. The high-frequency, pure-tone average (1000, 2000, 4000 Hz) had higher correlations with the SPRINT, WIN, and NU-6 in quiet than did the traditional three-frequency pure-tone average (500, 1000, 2000 Hz). Conclusions: Graphically and numerically the SPRINT and WIN were highly related, which is indicative of good concurrent validity of the SPRINT.


Author(s):  
Jenni-Mari Potgieter ◽  
De Wet Swanepoel ◽  
Cas Smits

Background: Speech-in-noise tests have become a valuable part of the audiometric test battery providing an indication of a listener’s ability to function in background noise. A simple digits-in-noise (DIN) test could be valuable to support diagnostic hearing assessments, hearing aid fittings and counselling for both paediatric and adult populations. Objective: The objective of this study was to evaluate the South African English smartphone DIN test’s performance as part of the audiometric test battery. Design: This descriptive study evaluated 109 adult subjects (43 male and 66 female subjects) with and without sensorineural hearing loss by comparing pure-tone air conduction thresholds, speech recognition monaural performance scores (SRS dB) and the DIN speech reception threshold (SRT). An additional nine adult hearing aid users (four male and five female subjects) were included in a subset to determine aided and unaided DIN SRTs. Results: The DIN SRT is strongly associated with the best ear 4 frequency pure-tone average (4FPTA) (rs = 0.81) and maximum SRS dB (r = 0.72). The DIN test had high sensitivity and specificity to identify abnormal pure-tone (0.88 and 0.88, respectively) and SRS dB (0.76 and 0.88, respectively) results. There was a mean signal-to-noise ratio (SNR) improvement in the aided condition that demonstrated an overall benefit of 0.84 SNR dB. Conclusion: The DIN SRT was significantly correlated with the best ear 4FPTA and maximum SRS dB. The DIN SRT provides a useful measure of speech recognition in noise that can evaluate hearing aid fittings, manage counselling and hearing expectations.


2020 ◽  
Vol 27 (8) ◽  
Author(s):  
Nishant Aggarwal ◽  
Mohil Garg ◽  
Vignesh Dwarakanathan ◽  
Nitesh Gautam ◽  
Swasthi S Kumar ◽  
...  

Abstract Infrared thermal screening, via the use of handheld non-contact infrared thermometers (NCITs) and thermal scanners, has been widely implemented all over the world. We performed a systematic review and meta-analysis to investigate its diagnostic accuracy for the detection of fever. We searched PubMed, Embase, the Cochrane Library, medRxiv, bioRxiv, ClinicalTrials.gov, COVID-19 Open Research Dataset, COVID-19 research database, Epistemonikos, EPPI-Centre, World Health Organization International Clinical Trials Registry Platform, Scopus and Web of Science databases for studies where a non-contact infrared device was used to detect fever against a reference standard of conventional thermometers. Forest plots and Hierarchical Summary Receiver Operating Characteristics curves were used to describe the pooled summary estimates of sensitivity, specificity and diagnostic odds ratio. From a total of 1063 results, 30 studies were included in the qualitative synthesis, of which 19 were included in the meta-analysis. The pooled sensitivity and specificity were 0.808 (95%CI 0.656–0.903) and 0.920 (95%CI 0.769–0.975), respectively, for the NCITs (using forehead as the site of measurement), and 0.818 (95%CI 0.758–0.866) and 0.923 (95%CI 0.823–0.969), respectively, for thermal scanners. The sensitivity of NCITs increased on use of rectal temperature as the reference. The sensitivity of thermal scanners decreased in a disease outbreak/pandemic setting. Changes approaching statistical significance were also observed on the exclusion of neonates from the analysis. Thermal screening had a low positive predictive value, especially at the initial stage of an outbreak, whereas the negative predictive value (NPV) continued to be high even at later stages. Thermal screening has reasonable diagnostic accuracy in the detection of fever, although it may vary with changes in subject characteristics, setting, index test and the reference standard used. Thermal screening has a good NPV even during a pandemic. The policymakers must take into consideration the factors surrounding the screening strategy while forming ad-hoc guidelines.


2019 ◽  
Vol 30 (01) ◽  
pp. 054-065
Author(s):  
Andrew J. Vermiglio ◽  
Caroline C. Herring ◽  
Paige Heeke ◽  
Courtney E. Post ◽  
Xiangming Fang

AbstractSpeech recognition in noise (SRN) evaluations reveal information about listening ability that is unavailable from pure-tone thresholds. Unfortunately, SRN evaluations are not commonly used in the clinic. A lack of standardization may be an explanation for the lack of widespread acceptance of SRN testing. Arguments have been made for the utilization of steady-state speech-shaped noise vs. multi-talker babble. Previous investigations into the effect of masker type have used a monaural presentation of the stimuli. However, results of monaural SRN tests cannot be generalized to binaural listening conditions.The purpose of this study was to investigate the effect of masker type on SRN thresholds under binaural listening conditions.The Hearing in Noise Test (HINT) protocol was selected in order to measure SRN thresholds in steady-state speech-shaped noise (HINT noise) and four-talker babble with and without the spatial separation of the target speech and masker stimuli.Fifty native speakers of English with normal pure-tone thresholds (≤ 25 dB HL, 250–4000 Hz) participated in the study. The mean age was 20.5 years (SD 1.01).All participants were tested using the standard protocol for the HINT in a simulated soundfield environment under TDH-50P headphones. Thresholds were measured for the Noise Front, Noise Left, and Noise Right listening conditions with HINT noise and four-talker babble. The HINT composite score was determined for each noise condition. The spatial advantage was calculated from the HINT thresholds. Pure-tone threshold data were collected using the modified Hughson-Westlake procedure. Statistical analyses include descriptive statistics, effect size, correlations, and repeated measures ANOVA followed by matched-pairs t-tests.Repeated measures ANOVA was conducted to investigate the effects of masker type and noise location on HINT thresholds. Both main effects and their interaction were statistically significant (p < 0.01). No significant differences were found between masker conditions for the Noise Front thresholds. However, for the Noise Side conditions the four-talker babble thresholds were significantly better than the HINT noise thresholds. Overall, greater spatial advantage was found for the four-talker babble as opposed to the HINT noise conditions (p < 0.01). Pearson correlation analysis revealed no significant relationships between four-talker babble and HINT noise speech recognition performances for the Noise Front, Noise Right conditions, and the spatial advantage measures. Significant relationships (p < 0.05) were found between masking noise performances for the Noise Left condition and the Noise Composite scores.One cannot assume that a patient who performs within normal limits on a speech in four-talker babble test will also perform within normal limits on a speech in steady-state speech-shaped noise test, and vice-versa. Additionally, performances for the Noise Front condition cannot be used to predict performances for the Noise Side conditions. The utilization of both HINT noise and four-talker babble maskers, with and without the spatial separation of the stimuli, may be useful when determining the range of speech recognition in noise abilities found in everyday listening conditions.


Sign in / Sign up

Export Citation Format

Share Document