Moebius Syndrome: Measures of Observer Intelligibility with versus without Visual Cues in Bilateral Facial Paralysis

2007 ◽  
Vol 44 (5) ◽  
pp. 518-522 ◽  
Author(s):  
Shelley Von Berg ◽  
Douglas McColl ◽  
Tami Brancamp

Objective: This study investigated observers’ intelligibility for the spoken output of an individual with Moebius syndrome (MoS) with and without visual cues. Design: An audiovisual recording of the speaker's output was obtained for 50 Speech Intelligibility in Noise sentences consisting of 25 high predictability and 25 low predictability sentences. Stimuli were presented to observers under two conditions: audiovisual and audio only. Data were analyzed using a multivariate repeated measures model. Observers: Twenty students and faculty affiliated with the Department of Speech Pathology and Audiology at the University of Nevada, Reno. Results: ANOVA mixed design revealed that intelligibility for the audio condition only was significantly greater than intelligibility for the audiovisual condition; and accuracy for high predictability sentences was significantly greater than accuracy for low predictability sentences. Conclusions: The compensatory substitutional placements for phonemes produced by MoS speakers may detract from the intelligibility of speech. This is similar to the McGurk-MacDonald effect, whereby an illusory auditory signal is perceived when visual information from lip movements does not match the auditory information from speech. It also suggests that observers use contextual clues, more than the acoustic signal alone, to arrive at the accurate recognition of the message of the speakers with MoS. Therefore, speakers with MoS should be counseled in the top-down approach of auditory closure. When the speech signal is degraded, predictable messages are more easily understood than unpredictable ones. It is also important to confirm the speaking partner's understanding of the topic before proceeding.

2020 ◽  
Vol 31 (01) ◽  
pp. 030-039 ◽  
Author(s):  
Aaron C. Moberly ◽  
Kara J. Vasil ◽  
Christin Ray

AbstractAdults with cochlear implants (CIs) are believed to rely more heavily on visual cues during speech recognition tasks than their normal-hearing peers. However, the relationship between auditory and visual reliance during audiovisual (AV) speech recognition is unclear and may depend on an individual’s auditory proficiency, duration of hearing loss (HL), age, and other factors.The primary purpose of this study was to examine whether visual reliance during AV speech recognition depends on auditory function for adult CI candidates (CICs) and adult experienced CI users (ECIs).Participants included 44 ECIs and 23 CICs. All participants were postlingually deafened and had met clinical candidacy requirements for cochlear implantation.Participants completed City University of New York sentence recognition testing. Three separate lists of twelve sentences each were presented: the first in the auditory-only (A-only) condition, the second in the visual-only (V-only) condition, and the third in combined AV fashion. Each participant’s amount of “visual enhancement” (VE) and “auditory enhancement” (AE) were computed (i.e., the benefit to AV speech recognition of adding visual or auditory information, respectively, relative to what could potentially be gained). The relative reliance of VE versus AE was also computed as a VE/AE ratio.VE/AE ratio was predicted inversely by A-only performance. Visual reliance was not significantly different between ECIs and CICs. Duration of HL and age did not account for additional variance in the VE/AE ratio.A shift toward visual reliance may be driven by poor auditory performance in ECIs and CICs. The restoration of auditory input through a CI does not necessarily facilitate a shift back toward auditory reliance. Findings suggest that individual listeners with HL may rely on both auditory and visual information during AV speech recognition, to varying degrees based on their own performance and experience, to optimize communication performance in real-world listening situations.


2017 ◽  
Vol 30 (7-8) ◽  
pp. 653-679 ◽  
Author(s):  
Nida Latif ◽  
Agnès Alsius ◽  
K. G. Munhall

During conversations, we engage in turn-taking behaviour that proceeds back and forth effortlessly as we communicate. In any given day, we participate in numerous face-to-face interactions that contain social cues from our partner and we interpret these cues to rapidly identify whether it is appropriate to speak. Although the benefit provided by visual cues has been well established in several areas of communication, the use of visual information to make turn-taking decisions during conversation is unclear. Here we conducted two experiments to investigate the role of visual information in identifying conversational turn exchanges. We presented clips containing single utterances spoken by single individuals engaged in a natural conversation with another. These utterances were from either right before a turn exchange (i.e., when the current talker would finish and the other would begin) or were utterances where the same talker would continue speaking. In Experiment 1, participants were presented audiovisual, auditory-only and visual-only versions of our stimuli and identified whether a turn exchange would occur or not. We demonstrated that although participants could identify turn exchanges with unimodal information alone, they performed best in the audiovisual modality. In Experiment 2, we presented participants audiovisual turn exchanges where the talker, the listener or both were visible. We showed that participants suffered a cost at identifying turns exchanges when visual cues from the listener were not available. Overall, we demonstrate that although auditory information is sufficient for successful conversation, visual information plays an important role in the overall efficiency of communication.


2021 ◽  
Author(s):  
Hoyoung Yi ◽  
Ashly Pingsterhaus ◽  
Woonyoung Song

The coronavirus pandemic has resulted in recommended/required use of a face mask in public. The use of a face mask compromises communication, especially in the presence of competing noise. It is crucial to measure potential adverse effect(s) of wearing face masks on speech intelligibility in communication contexts where excessive background noise occurs to lead to solutions for this communication challenge. Accordingly, effects of wearing transparent face masks and using clear speech to support better verbal communication was evaluated here. We evaluated listener word identification scores in the following four conditions: (1) type of masking (i.e., no mask, transparent mask, and disposable paper mask), (2) presentation mode (i.e., auditory only and audiovisual), (3) speaker speaking style (i.e., conversational speech and clear speech), and (4) with two types of background noise (i.e., speech shaped noise and four-talker babble at negative 5 signal to noise ratio levels). Results showed that in the presence of noise, listeners performed less well when the speaker wore a disposable paper mask or a transparent mask compared to wearing no mask. Listeners correctly identified more words in the audiovisual when listening to clear speech. Results indicate the combination of face masks and the presence of background noise impact speech intelligibility negatively for listeners. Transparent masks facilitate the ability to understand target sentences by providing visual information. Use of clear speech was shown to alleviate challenging communication situations including lack of visual cues and reduced acoustic signal.


2017 ◽  
Vol 114 (21) ◽  
pp. E4134-E4141 ◽  
Author(s):  
Andrew Chang ◽  
Steven R. Livingstone ◽  
Dan J. Bosnyak ◽  
Laurel J. Trainor

The cultural and technological achievements of the human species depend on complex social interactions. Nonverbal interpersonal coordination, or joint action, is a crucial element of social interaction, but the dynamics of nonverbal information flow among people are not well understood. We used joint music making in string quartets, a complex, naturalistic nonverbal behavior, as a model system. Using motion capture, we recorded body sway simultaneously in four musicians, which reflected real-time interpersonal information sharing. We used Granger causality to analyze predictive relationships among the motion time series of the players to determine the magnitude and direction of information flow among the players. We experimentally manipulated which musician was the leader (followers were not informed who was leading) and whether they could see each other, to investigate how these variables affect information flow. We found that assigned leaders exerted significantly greater influence on others and were less influenced by others compared with followers. This effect was present, whether or not they could see each other, but was enhanced with visual information, indicating that visual as well as auditory information is used in musical coordination. Importantly, performers’ ratings of the “goodness” of their performances were positively correlated with the overall degree of body sway coupling, indicating that communication through body sway reflects perceived performance success. These results confirm that information sharing in a nonverbal joint action task occurs through both auditory and visual cues and that the dynamics of information flow are affected by changing group relationships.


2012 ◽  
Vol 25 (0) ◽  
pp. 112 ◽  
Author(s):  
Lukasz Piwek ◽  
Karin Petrini ◽  
Frank E. Pollick

Multimodal perception of emotions has been typically examined using displays of a solitary character (e.g., the face–voice and/or body–sound of one actor). We extend investigation to more complex, dyadic point-light displays combined with speech. A motion and voice capture system was used to record twenty actors interacting in couples with happy, angry and neutral emotional expressions. The obtained stimuli were validated in a pilot study and used in the present study to investigate multimodal perception of emotional social interactions. Participants were required to categorize happy and angry expressions displayed visually, auditorily, or using emotionally congruent and incongruent bimodal displays. In a series of cross-validation experiments we found that sound dominated the visual signal in the perception of emotional social interaction. Although participants’ judgments were faster in the bimodal condition, the accuracy of judgments was similar for both bimodal and auditory-only conditions. When participants watched emotionally mismatched bimodal displays, they predominantly oriented their judgments towards the auditory rather than the visual signal. This auditory dominance persisted even when the reliability of auditory signal was decreased with noise, although visual information had some effect on judgments of emotions when it was combined with a noisy auditory signal. Our results suggest that when judging emotions from observed social interaction, we rely primarily on vocal cues from the conversation, rather then visual cues from their body movement.


Author(s):  
Michael F. Dorman ◽  
Sarah Cook Natale ◽  
Smita Agrawal

Abstract Background Both the Roger remote microphone and on-ear, adaptive beamforming technologies (e.g., Phonak UltraZoom) have been shown to improve speech understanding in noise for cochlear implant (CI) listeners when tested in audio-only (A-only) test environments. Purpose Our aim was to determine if adult and pediatric CI recipients benefited from these technologies in a more common environment—one in which both audio and visual cues were available and when overall performance was high. Study Sample Ten adult CI listeners (Experiment 1) and seven pediatric CI listeners (Experiment 2) were tested. Design Adults were tested in quiet and in two levels of noise (level 1 and level 2) in A-only and audio-visual (AV) environments. There were four device conditions: (1) an ear canal-level, omnidirectional microphone (T-mic) in quiet, (2) the T-mic in noise, (3) an adaptive directional mic (UltraZoom) in noise, and (4) a wireless, remote mic (Roger Pen) in noise. Pediatric listeners were tested in quiet and in level 1 noise in A-only and AV environments. The test conditions were: (1) a behind-the-ear level omnidirectional mic (processor mic) in quiet, (2) the processor mic in noise, (3) the T-mic in noise, and (4) the Roger Pen in noise. Data Collection and Analyses In each test condition, sentence understanding was assessed (percent correct) and ease of listening ratings were obtained. The sentence understanding data were entered into repeated-measures analyses of variance. Results For both adult and pediatric listeners in the AV test conditions in level 1 noise, performance with the Roger Pen was significantly higher than with the T-mic. For both populations, performance in level 1 noise with the Roger Pen approached the level of baseline performance in quiet. Ease of listening in noise was rated higher in the Roger Pen conditions than in the T-mic or processor mic conditions in both A-only and AV test conditions. Conclusion The Roger remote mic and on-ear directional mic technologies benefit both speech understanding and ease of listening in a realistic laboratory test environment and are likely do the same in real-world listening environments.


2021 ◽  
Vol 15 ◽  
Author(s):  
Thorben Hülsdünker ◽  
David Riedel ◽  
Hannes Käsbauer ◽  
Diemo Ruhnow ◽  
Andreas Mierau

Although vision is the dominating sensory system in sports, many situations require multisensory integration. Faster processing of auditory information in the brain may facilitate time-critical abilities such as reaction speed however previous research was limited by generic auditory and visual stimuli that did not consider audio-visual characteristics in ecologically valid environments. This study investigated the reaction speed in response to sport-specific monosensory (visual and auditory) and multisensory (audio-visual) stimulation. Neurophysiological analyses identified the neural processes contributing to differences in reaction speed. Nineteen elite badminton players participated in this study. In a first recording phase, the sound profile and shuttle speed of smash and drop strokes were identified on a badminton court using high-speed video cameras and binaural recordings. The speed and sound characteristics were transferred into auditory and visual stimuli and presented in a lab-based experiment, where participants reacted in response to sport-specific monosensory or multisensory stimulation. Auditory signal presentation was delayed by 26 ms to account for realistic audio-visual signal interaction on the court. N1 and N2 event-related potentials as indicators of auditory and visual information perception/processing, respectively were identified using a 64-channel EEG. Despite the 26 ms delay, auditory reactions were significantly faster than visual reactions (236.6 ms vs. 287.7 ms, p < 0.001) but still slower when compared to multisensory stimulation (224.4 ms, p = 0.002). Across conditions response times to smashes were faster when compared to drops (233.2 ms, 265.9 ms, p < 0.001). Faster reactions were paralleled by a lower latency and higher amplitude of the auditory N1 and visual N2 potentials. The results emphasize the potential of auditory information to accelerate the reaction time in sport-specific multisensory situations. This highlights auditory processes as a promising target for training interventions in racquet sports.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hoyoung Yi ◽  
Ashly Pingsterhaus ◽  
Woonyoung Song

The coronavirus pandemic has resulted in the recommended/required use of face masks in public. The use of a face mask compromises communication, especially in the presence of competing noise. It is crucial to measure the potential effects of wearing face masks on speech intelligibility in noisy environments where excessive background noise can create communication challenges. The effects of wearing transparent face masks and using clear speech to facilitate better verbal communication were evaluated in this study. We evaluated listener word identification scores in the following four conditions: (1) type of mask condition (i.e., no mask, transparent mask, and disposable face mask), (2) presentation mode (i.e., auditory only and audiovisual), (3) speaking style (i.e., conversational speech and clear speech), and (4) with two types of background noise (i.e., speech shaped noise and four-talker babble at −5 signal-to-noise ratio). Results indicate that in the presence of noise, listeners performed less well when the speaker wore a disposable face mask or a transparent mask compared to wearing no mask. Listeners correctly identified more words in the audiovisual presentation when listening to clear speech. Results indicate the combination of face masks and the presence of background noise negatively impact speech intelligibility for listeners. Transparent masks facilitate the ability to understand target sentences by providing visual information. Use of clear speech was shown to alleviate challenging communication situations including compensating for a lack of visual cues and reduced acoustic signals.


2018 ◽  
Vol 72 (5) ◽  
pp. 1141-1154 ◽  
Author(s):  
Daniele Nardi ◽  
Brian J Anzures ◽  
Josie M Clark ◽  
Brittany V Griffith

Among the environmental stimuli that can guide navigation in space, most attention has been dedicated to visual information. The process of determining where you are and which direction you are facing (called reorientation) has been extensively examined by providing the navigator with two sources of information—typically the shape of the environment and its features—with an interest in the extent to which they are used. Similar questions with non-visual cues are lacking. Here, blindfolded sighted participants had to learn the location of a target in a real-world, circular search space. In Experiment 1, two ecologically relevant non-visual cues were provided: the slope of the floor and an array of two identical auditory landmarks. Slope successfully guided behaviour, suggesting that proprioceptive/kinesthetic access is sufficient to navigate on a slanted environment. However, despite the fact that participants could localise the auditory sources, this information was not encoded. In Experiment 2, the auditory cue was made more useful for the task because it had greater predictive value and there were no competing spatial cues. Nonetheless, again, the auditory landmark was not encoded. Finally, in Experiment 3, after being prompted, participants were able to reorient by using the auditory landmark. Overall, participants failed to spontaneously rely on the auditory cue, regardless of how informative it was.


2021 ◽  
Vol 15 ◽  
Author(s):  
Yi Yuan ◽  
Yasneli Lleo ◽  
Rebecca Daniel ◽  
Alexandra White ◽  
Yonghee Oh

Speech perception often takes place in noisy environments, where multiple auditory signals compete with one another. The addition of visual cues such as talkers’ faces or lip movements to an auditory signal can help improve the intelligibility of speech in those suboptimal listening environments. This is referred to as audiovisual benefits. The current study aimed to delineate the signal-to-noise ratio (SNR) conditions under which visual presentations of the acoustic amplitude envelopes have their most significant impact on speech perception. Seventeen adults with normal hearing were recruited. Participants were presented with spoken sentences in babble noise either in auditory-only or auditory-visual conditions with various SNRs at −7, −5, −3, −1, and 1 dB. The visual stimulus applied in this study was a sphere that varied in size syncing with the amplitude envelope of the target speech signals. Participants were asked to transcribe the sentences they heard. Results showed that a significant improvement in accuracy in the auditory-visual condition versus the audio-only condition was obtained at the SNRs of −3 and −1 dB, but no improvement was observed in other SNRs. These results showed that dynamic temporal visual information can benefit speech perception in noise, and the optimal facilitative effects of visual amplitude envelope can be observed under an intermediate SNR range.


Sign in / Sign up

Export Citation Format

Share Document