A Comparison of Recognition Performances in Speech-Spectrum Noise by Listeners with Normal Hearing on PB-50, CID W-22, NU–6, W-1 Spondaic Words, and Monosyllabic Digits Spoken by the Same Speaker

2008 ◽  
Vol 19 (06) ◽  
pp. 496-506 ◽  
Author(s):  
Richard H. Wilson ◽  
Rachel McArdle ◽  
Heidi Roberts

Background: So that portions of the classic Miller, Heise, and Lichten (1951) study could be replicated, new recorded versions of the words and digits were made because none of the three common monosyllabic word lists (PAL PB-50, CID W-22, and NU–6) contained the 9 monosyllabic digits (1–10, excluding 7) that were used by Miller et al. It is well established that different psychometric characteristics have been observed for different lists and even for the same materials spoken by different speakers. The decision was made to record four lists of each of the three monosyllabic word sets, the monosyllabic digits not included in the three sets of word lists, and the CID W-1 spondaic words. A professional female speaker with a General American dialect recorded the materials during four recording sessions within a 2-week interval. The recording order of the 582 words was random. Purpose: To determine—on listeners with normal hearing—the psychometric properties of the five speech materials presented in speech-spectrum noise. Research Design: A quasi-experimental, repeated-measures design was used. Study Sample: Twenty-four young adult listeners (M = 23 years) with normal pure-tone thresholds (≤20-dB HL at 250 to 8000 Hz) participated. The participants were university students who were unfamiliar with the test materials. Data Collection and Analysis: The 582 words were presented at four signal-to-noise ratios (SNRs; −7-, −2-, 3-, and 8-dB) in speech-spectrum noise fixed at 72-dB SPL. Although the main metric of interest was the 50% point on the function for each word established with the Spearman-Kärber equation (Finney, 1952), the percentage correct on each word at each SNR was evaluated. The psychometric characteristics of the PB-50, CID W-22, and NU–6 monosyllabic word lists were compared with one another, with the CID W-1 spondaic words, and with the 9 monosyllabic digits. Results: Recognition performance on the four lists within each of the three monosyllabic word materials were equivalent, ±0.4 dB. Likewise, word-recognition performance on the PB-50, W-22, and NU–6 word lists were equivalent, ±0.2 dB. The mean recognition performance at the 50% point with the 36 W-1 spondaic words was ˜6.2 dB lower than the 50% point with the monosyllabic words. Recognition performance on the monosyllabic digits was 1–2 dB better than mean performance on the monosyllabic words. Conclusions: Word-recognition performances on the three sets of materials (PB-50, CID W-22, and NU–6) were equivalent, as were the performances on the four lists that make up each of the three materials. Phonetic/phonemic balance does not appear to be an important consideration in the compilation of word-recognition lists used to evaluate the ability of listeners to understand speech.A companion paper examines the acoustic, phonetic/phonological, and lexical variables that may predict the relative ease or difficulty for which these monosyllable words were recognized in noise (McArdle and Wilson, this issue).

2017 ◽  
Vol 28 (01) ◽  
pp. 068-079
Author(s):  
Richard H. Wilson ◽  
Kadie C. Sharrett

AbstractTwo previous experiments from our laboratory with 70 interrupted monosyllabic words demonstrated that recognition performance was influenced by the temporal location of the interruption pattern. The interruption pattern (10 interruptions/sec, 50% duty cycle) was always the same and referenced word onset; the only difference between the patterns was the temporal location of the on- and off-segments of the interruption cycle. In the first study, both young and older listeners obtained better recognition performances when the initial on-segment coincided with word onset than when the initial on-segment was delayed by 50 msec. The second experiment with 24 young listeners detailed recognition performance as the interruption pattern was incremented in 10-msec steps through the 0- to 90-msec onset range. Across the onset conditions, 95% of the functions were either flat or U-shaped.To define the effects that interruption pattern locations had on word recognition by older listeners with sensorineural hearing loss as the interruption pattern incremented, re: word onset, from 0 to 90 msec in 10-msec steps.A repeated-measures design with ten interruption patterns (onset conditions) and one uninterruption condition.Twenty-four older males (mean = 69.6 yr) with sensorineural hearing loss participated in two 1-hour sessions. The three-frequency pure-tone average was 24.0 dB HL and word recognition was ≥80% correct.Seventy consonant-vowel nucleus-consonant words formed the corpus of materials with 25 additional words used for practice. For each participant, the 700 interrupted stimuli (70 words by 10 onset conditions), the 70 words uninterrupted, and two practice lists each were randomized and recorded on compact disc in 33 tracks of 25 words each.The data were analyzed at the participant and word levels and compared to the results obtained earlier on 24 young listeners with normal hearing. The mean recognition performance on the 70 words uninterrupted was 91.0% with an overall mean performance on the ten interruption conditions of 63.2% (range: 57.9–69.3%), compared to 80.4% (range: 73.0–87.7%) obtained earlier on the young adults. The best performances were at the extremes of the onset conditions. Standard deviations ranged from 22.1% to 28.1% (24 participants) and from 9.2% to 12.8% (70 words). An arithmetic algorithm categorized the shapes of the psychometric functions across the ten onset conditions. With the older participants in the current study, 40% of the functions were flat, 41.4% were U-shaped, and 18.6% were inverted U-shaped, which compared favorably to the function shapes by the young listeners in the earlier study of 50.0%, 41.4%, and 8.6%, respectively. There were two words on which the older listeners had 40% better performances.Collectively, the data are orderly, but at the individual word or participant level, the data are somewhat volatile, which may reflect auditory processing differences between the participant groups. The diversity of recognition performances by the older listeners on the ten interruption conditions with each of the 70 words supports the notion that the term hearing loss is inclusive of processes well beyond the filtering produced by end-organ sensitivity deficits.


2008 ◽  
Vol 19 (06) ◽  
pp. 507-518 ◽  
Author(s):  
Rachel McArdle ◽  
Richard H. Wilson

Purpose: To analyze the 50% correct recognition data that were from the Wilson et al (this issue) study and that were obtained from 24 listeners with normal hearing; also to examine whether acoustic, phonetic, or lexical variables can predict recognition performance for monosyllabic words presented in speech-spectrum noise. Research Design: The specific variables are as follows: (a) acoustic variables (i.e., effective root-mean-square sound pressure level, duration), (b) phonetic variables (i.e., consonant features such as manner, place, and voicing for initial and final phonemes; vowel phonemes), and (c) lexical variables (i.e., word frequency, word familiarity, neighborhood density, neighborhood frequency). Data Collection and Analysis: The descriptive, correlational study will examine the influence of acoustic, phonetic, and lexical variables on speech recognition in noise performance. Results: Regression analysis demonstrated that 45% of the variance in the 50% point was accounted for by acoustic and phonetic variables whereas only 3% of the variance was accounted for by lexical variables. These findings suggest that monosyllabic word-recognition-in-noise is more dependent on bottom-up processing than on top-down processing. Conclusions: The results suggest that when speech-in-noise testing is used in a pre- and post-hearing-aid-fitting format, the use of monosyllabic words may be sensitive to changes in audibility resulting from amplification.


2020 ◽  
Vol 31 (06) ◽  
pp. 412-441 ◽  
Author(s):  
Richard H. Wilson ◽  
Victoria A. Sanchez

Abstract Background In the 1950s, with monitored live voice testing, the vu meter time constant and the short durations and amplitude modulation characteristics of monosyllabic words necessitated the use of the carrier phrase amplitude to monitor (indirectly) the presentation level of the words. This practice continues with recorded materials. To relieve the carrier phrase of this function, first the influence that the carrier phrase has on word recognition performance needs clarification, which is the topic of this study. Purpose Recordings of Northwestern University Auditory Test No. 6 by two female speakers were used to compare word recognition performances with and without the carrier phrases when the carrier phrase and test word were (1) in the same utterance stream with the words excised digitally from the carrier (VA-1 speaker) and (2) independent of one another (VA-2 speaker). The 50-msec segment of the vowel in the target word with the largest root mean square amplitude was used to equate the target word amplitudes. Research Design A quasi-experimental, repeated measures design was used. Study Sample Twenty-four young normal-hearing adults (YNH; M = 23.5 years; pure-tone average [PTA] = 1.3-dB HL) and 48 older hearing loss listeners (OHL; M = 71.4 years; PTA = 21.8-dB HL) participated in two, one-hour sessions. Data Collection and Analyses Each listener had 16 listening conditions (2 speakers × 2 carrier phrase conditions × 4 presentation levels) with 100 randomized words, 50 different words by each speaker. Each word was presented 8 times (2 carrier phrase conditions × 4 presentation levels [YNH, 0- to 24-dB SL; OHL, 6- to 30-dB SL]). The 200 recorded words for each condition were randomized as 8, 25-word tracks. In both test sessions, one practice track was followed by 16 tracks alternated between speakers and randomized by blocks of the four conditions. Central tendency and repeated measures analyses of variance statistics were used. Results With the VA-1 speaker, the overall mean recognition performances were 6.0% (YNH) and 8.3% (OHL) significantly better with the carrier phrase than without the carrier phrase. These differences were in part attributed to the distortion of some words caused by the excision of the words from the carrier phrases. With the VA-2 speaker, recognition performances on the with and without carrier phrase conditions by both listener groups were not significantly different, except for one condition (YNH listeners at 8-dB SL). The slopes of the mean functions were steeper for the YNH listeners (3.9%/dB to 4.8%/dB) than for the OHL listeners (2.4%/dB to 3.4%/dB) and were <1%/dB steeper for the VA-1 speaker than for the VA-2 speaker. Although the mean results were clear, the variability in performance differences between the two carrier phrase conditions for the individual participants and for the individual words was striking and was considered in detail. Conclusion The current data indicate that word recognition performances with and without the carrier phrase (1) were different when the carrier phrase and target word were produced in the same utterance with poorer performances when the target words were excised from their respective carrier phrases (VA-1 speaker), and (2) were the same when the carrier phrase and target word were produced as independent utterances (VA-2 speaker).


2021 ◽  
Vol 32 (08) ◽  
pp. 547-554
Author(s):  
Soha N. Garadat ◽  
Ana'am Alkharabsheh ◽  
Nihad A. Almasri ◽  
Abdulrahman Hagr

Abstract Background Speech audiometry materials are widely available in many different languages. However, there are no known standardized materials for the assessment of speech recognition in Arabic-speaking children. Purpose The aim of the study was to develop and validate phonetically balanced and psychometrically equivalent monosyllabic word recognition lists for children through a picture identification task. Research Design A prospective repeated-measure design was used. Monosyllabic words were chosen from children's storybooks and were evaluated for familiarity. The selected words were then divided into four phonetically balanced word lists. The final lists were evaluated for homogeneity and equivalency. Study Sample Ten adults and 32 children with normal hearing sensitivity were recruited. Data Collection and Analyses Lists were presented to adult subjects in 5 dB increment from 0 to 60 dB hearing level. Individual data were then fitted using a sigmoid function from which the 50% threshold, slopes at the 50% points, and slopes at the 20 to 80% points were derived to determine list psychometric properties. Lists were next presented to children in two separate sessions to assess their equivalency, validity, and reliability. Data were subjected to a mixed design analysis of variance. Results No statistically significant difference was found among the word lists. Conclusion This study provided an evidence that the monosyllabic word lists had comparable psychometric characteristics and reliability. This supports that the constructed speech corpus is a valid tool that can be used in assessing speech recognition in Arabic-speaking children.


2020 ◽  
Vol 31 (07) ◽  
pp. 531-546
Author(s):  
Mitzarie A. Carlo ◽  
Richard H. Wilson ◽  
Albert Villanueva-Reyes

Abstract Background English materials for speech audiometry are well established. In Spanish, speech-recognition materials are not standardized with monosyllables, bisyllables, and trisyllables used in word-recognition protocols. Purpose This study aimed to establish the psychometric characteristics of common Spanish monosyllabic, bisyllabic, and trisyllabic words for potential use in word-recognition procedures. Research Design Prospective descriptive study. Study Sample Eighteen adult Puerto Ricans (M = 25.6 years) with normal hearing [M = 7.8-dB hearing level (HL) pure-tone average] were recruited for two experiments. Data Collection and Analyses A digital recording of 575 Spanish words was created (139 monosyllables, 359 bisyllables, and 77 trisyllables), incorporating materials from a variety of Spanish word-recognition lists. Experiment 1 (n = 6) used 25 randomly selected words from each of the three syllabic categories to estimate the presentation level ranges needed to obtain recognition performances over the 10 to 90% range. In Experiment 2 (n = 12) the 575 words were presented over five 1-hour sessions using presentation levels from 0- to 30-dB HL in 5-dB steps (monosyllables), 0- to 25-dB HL in 5-dB steps (bisyllables), and −3- to 17-dB HL in 4-dB steps (trisyllables). The presentation order of both the words and the presentation levels were randomized for each listener. The functions for each listener and each word were fit with polynomial equations from which the 50% points and slopes at the 50% point were calculated. Results The mean 50% points and slopes at 50% were 8.9-dB HL, 4.0%/dB (monosyllables), 6.9-dB HL, 5.1%/dB (bisyllables), and 1.4-dB HL, 6.3%/dB (trisyllables). The Kruskal–Wallis test with Mann–Whitney U post-hoc analysis indicated that the mean 50% points and slopes at the 50% points of the individual word functions were significantly different among the syllabic categories. Although significant differences were observed among the syllabic categories, substantial overlap was noted in the individual word functions, indicating that the psychometric characteristics of the words were not dictated exclusively by the syllabic number. Influences associated with word difficulty, word familiarity, singular and plural form words, phonetic stress patterns, and gender word patterns also were evaluated. Conclusion The main finding was the direct relation between the number of syllables in a word and word-recognition performance. In general, words with more syllables were more easily recognized; there were, however, exceptions. The current data from young adults with normal hearing established the psychometric characteristics of the 575 Spanish words on which the formulation of word lists for both threshold and suprathreshold measures of word-recognition abilities in quiet and in noise and other word-recognition protocols can be based.


1984 ◽  
Vol 27 (3) ◽  
pp. 378-386 ◽  
Author(s):  
Richard H. Wilson ◽  
John T. Arcos ◽  
Howard C. Jones

Consonant-vowel-consonant (CVC) monosyllabic words were segmented at the approximate phoneme boundaries and were presented to subjects with normal hearing in the following sequence: (a) the carrier phrase to both ears, (b) the initial consonant segment to one ear, (c) the vowel segment to the other ear, and (d) the final consonant segment to the ear that received the initial consonant. A computer technique, which is described in detail, was used to develop the test materials. The digital editing did not alter appreciably the spectral or temporal characteristics of the words. A series of four experiments produced a list of 50 words on which 10% correct word recognition was achieved by listeners with normal hearing when the vowel segment or the consonant segments of the words were presented monaurally in isolation. When the speech materials were presented binaurally—that is, the vowel segment in one ear and consonant segments in the other ear—word-recognition performance improved to 90% correct.


2014 ◽  
Vol 57 (1) ◽  
pp. 327-337 ◽  
Author(s):  
Mallory Baker ◽  
Emily Buss ◽  
Adam Jacks ◽  
Crystal Taylor ◽  
Lori J. Leibold

Purpose This study evaluated the degree to which children benefit from the acoustic modifications made by talkers when they produce speech in noise. Method A repeated measures design compared the speech perception performance of children (5–11 years) and adults in a 2-talker masker. Target speech was produced in a 2-talker background or in quiet. In Experiment 1, recognition with the 2 target sets was assessed using an adaptive spondee identification procedure. In Experiment 2, the benefit of speech produced in a 2-talker background was assessed using an open-set, monosyllabic word recognition task at a fixed signal-to-noise ratio (SNR). Results Children performed more poorly than adults, regardless of whether the target speech was produced in quiet or in a 2-talker background. A small improvement in the SNR required to identify spondees was observed for both children and adults using speech produced in a 2-talker background (Experiment 1). Similarly, average open-set word recognition scores were 11 percentage points higher for both age groups using speech produced in a 2-talker background compared with quiet (Experiment 2). Conclusion The results indicate that children can use the acoustic modifications of speech produced in a 2-talker background to improve masked speech perception, as previously demonstrated for adults.


2019 ◽  
Vol 30 (05) ◽  
pp. 370-395 ◽  
Author(s):  
Richard H. Wilson

AbstractThe Auditec of St. Louis and the Department of Veterans Affairs (VA) recorded versions of the Northwestern University Auditory Test No. 6 (NU-6) are in common usage. Data on young adults with normal hearing for pure tones (YNH) demonstrate equal recognition performances on the two versions when the VA version is presented 5 dB higher but similar data on older listeners with sensorineural hearing loss (OHL) are lacking.To compare word-recognition performances on the Auditec and VA versions of NU-6 presented at six presentation levels with YNH and OHL listeners.A quasi-experimental, repeated-measures design was used.Twelve YNH (M = 24.0 years; PTA = 9.9-dB HL) and 36 OHL listeners (M = 71.6 years; PTA = 26.7-dB HL) participated in three, one-hour sessions.Each listener received 100 stimulus words that were randomized by 6 presentation levels for each of two speakers (YNH, −2 to 28-dB SL; OHL, −2 to 38-dB SL). The sessions were limited to 25 practice and 400 experimental words. Digital versions of the 16, 25-word tracks for each session were alternated between speakers.Each of the 48 listeners had higher recognition performances on the Auditec version of NU-6 than on the VA version. The respective overall recognition performances on the Auditec and VA versions were 71.4% and 64.1% (YNH) and 68.7% and 58.2% (OHL). At the highest presentation levels, recognition performances on the two versions differed by only 0.5% (YNH) and 3.3% (OHL). At the 50% correct point, performances on the Auditec version were 3.2 dB (YNH) and 6.1 dB (OHL) better than those on the VA version. The slopes at the 50% points on the mean functions for both speakers were about 4.9%/dB (YNH) and 3.0%/dB (OHL); however, the slopes evaluated from the individual listener data were steeper, 5.2 to 5.3%/dB (YNH) and 3.3 to 3.5%/dB (OHL). When the individual data were transformed from dB SL to dB HL, the differences between the two listener groups were emphasized. The four functions (2 speakers by 2 listener groups) were plotted for each of the 48 participants and each of the 200 words, which revealed the gamut of relations among the datasets. Examination of the data for each speaker across test sessions, in the traditional 50-word lists, and in the typically used 25-word lists of Randomization A revealed no differences of clinical concern. Finally, introspective reports from the listeners revealed that 91.7% and 83.3% of the YNH and OHL listeners, respectively, thought the Auditec speaker was easier to understand than the VA speaker. Recognition performances on each participant and on each word are presented.


Author(s):  
Richard H. Wilson ◽  
Victoria A. Sanchez

Background: In the 1950s, with monitored live voice testing, the vu meter time constant and the shortdurations and amplitude modulation characteristics of monosyllabic words necessitated the use of the carrierphrase amplitude tomonitor (indirectly) the presentation level of the words. This practice continues withrecorded materials. To relieve the carrier phrase of this function, first the influence that the carrier phrasehas on word recognition performance needs clarification, which is the topic of this study.<br />Purpose: Recordings of Northwestern University Auditory Test No. 6 by two female speakers were usedto compare word recognition performances with and without the carrier phrases when the carrier phraseand test word were (1) in the same utterance stream with the words excised digitally from the carrier (VA-1speaker) and (2) independent of one another (VA-2 speaker). The 50-msec segment of the vowel in thetarget word with the largest root mean square amplitude was used to equate the target word amplitudes.<br />Research Design: A quasi-experimental, repeated measures design was used.<br />Study Sample: Twenty-four young normal-hearing adults (YNH; M = 23.5 years; pure-tone average[PTA] = 1.3-dB HL) and 48 older hearing loss listeners (OHL; M = 71.4 years; PTA = 21.8-dB HL) participatedin two, one-hour sessions.<br />Data Collection and Analyses: Each listener had 16 listening conditions (2 speakers 3 2 carrier phraseconditions 3 4 presentation levels) with 100 randomized words, 50 different words by each speaker.Each word was presented 8 times (2 carrier phrase conditions 3 4 presentation levels [YNH, 0- to24-dB SL; OHL, 6- to 30-dB SL]). The 200 recorded words for each condition were randomized as 8,25-word tracks. In both test sessions, one practice track was followed by 16 tracks alternated betweenspeakers and randomized by blocks of the four conditions. Central tendency and repeated measuresanalyses of variance statistics were used.<br />Results: With the VA-1 speaker, the overall mean recognition performances were 6.0% (YNH) and 8.3%(OHL) significantly better with the carrier phrase than without the carrier phrase. These differences werein part attributed to the distortion of some words caused by the excision of the words from the carrierphrases. With the VA-2 speaker, recognition performances on the with and without carrier phrase conditionsby both listener groups were not significantly different, except for one condition (YNH listeners at8-dB SL). The slopes of the mean functions were steeper for the YNH listeners (3.9%/dB to 4.8%/dB) thanfor the OHL listeners (2.4%/dB to 3.4%/dB) and were <1%/dB steeper for the VA-1 speaker than for theVA-2 speaker. Although the mean results were clear, the variability in performance differences betweenthe two carrier phrase conditions for the individual participants and for the individual words was strikingand was considered in detail.<br />Conclusion: The current data indicate that word recognition performances with and without the carrierphrase (1) were different when the carrier phrase and target word were produced in the same utterancewith poorer performances when the target words were excised from their respective carrier phrases(VA-1 speaker), and (2) were the same when the carrier phrase and target word were produced as independentutterances (VA-2 speaker).<br />See the Supplementary Data tab for supplementary materials.


2015 ◽  
Vol 26 (07) ◽  
pp. 670-677 ◽  
Author(s):  
Richard H. Wilson ◽  
Heather M. Hamm

Background: A previous experiment with 70 interrupted monosyllabic words demonstrated that recognition performance was influenced by the location of an interruption pattern (Wilson, 2014). The interruption paradigm (10 interruptions/sec, 50% duty cycle periodic interruption) was referenced to word onset. The words were interrupted such that alternate 50-msec segments were parsed to separate files. In the 0-msec condition the first on-segment coincided with the word onset, whereas in the 50-msec condition the first on-segment occurred 50 msec after word onset. The 0- and 50-msec conditions were complementary halves. Recognition performance by young listeners was 19% better on the 0-msec condition (86%) than on the 50-msec condition (68%); there were a minority number of words on which the results were just the opposite. A second study using the same interruption paradigm but 300 different words reported similar relations, with 63% correct recognition on the 0-msec condition and 48% on the 50-msec condition (Wilson and Irish, 2015). Both studies suggest the importance that the first 50 msec of the target word has on intelligibility. Purpose: To define in detail the effects that interruption patterns have on word recognition as the interruption pattern was incremented with reference to word onset from 0 to 90 msec in 10-msec steps. Research Design: A repeated-measures design with ten interruption patterns (onset conditions). Study Sample: Twenty-four young listeners (19–29 yr) with normal hearing for pure tones participated in this study. Data Collection and Analyses: Seventy consonant-nucleus-consonant words formed the corpus of materials with 25 additional words used for practice. For each participant, the 700 stimuli (70 words by ten onset conditions) were interrupted (10 interruptions/sec; 50% duty cycle), randomized, and recorded on compact disc in 28, 25-word tracks. Results: The overall mean recognition performance was 80.4% with mean performances for the ten conditions ranging from 73.0% (50-msec condition) to 87.7% (90-msec condition). The mean recognition performances changed systematically, decreasing from the 0-msec condition to the 50-msec condition and then increasing to the 90-msec condition, which formed a U-shaped function of the means. Of the 45 mean paired comparisons (post hoc t-tests with Bonferroni corrections), there were 17 significant differences at the p ≤ 0.001 level, increasing to 31 significant differences when the significance level was increased to the p ≤ 0.01 level. Visual inspection of the 70-word performance functions revealed that 32 words had flat functions, 34 words had U-shaped functions, two functions were rising, one was an inverted V-shape, and one was irregular. Conclusions: First, some words (utterances of those words) were immune to any differential effects of the ten interruption patterns. These words with flat performance functions constituted 46% of the word corpus. Second, 49% of the words exhibited U-shaped performance functions that were always systematic, going from maximum to minimum and back to maximum. These words were thought to be more dependent on the initial consonant to attain maximum performance. The conclusion is that some words are not affected by the location of the interruption pattern (those with flat functions) whereas other words are substantially affected (those with U-shaped functions).


Sign in / Sign up

Export Citation Format

Share Document