Spatial reorientation with non-visual cues: Failure to spontaneously use auditory information

2018 ◽  
Vol 72 (5) ◽  
pp. 1141-1154 ◽  
Author(s):  
Daniele Nardi ◽  
Brian J Anzures ◽  
Josie M Clark ◽  
Brittany V Griffith

Among the environmental stimuli that can guide navigation in space, most attention has been dedicated to visual information. The process of determining where you are and which direction you are facing (called reorientation) has been extensively examined by providing the navigator with two sources of information—typically the shape of the environment and its features—with an interest in the extent to which they are used. Similar questions with non-visual cues are lacking. Here, blindfolded sighted participants had to learn the location of a target in a real-world, circular search space. In Experiment 1, two ecologically relevant non-visual cues were provided: the slope of the floor and an array of two identical auditory landmarks. Slope successfully guided behaviour, suggesting that proprioceptive/kinesthetic access is sufficient to navigate on a slanted environment. However, despite the fact that participants could localise the auditory sources, this information was not encoded. In Experiment 2, the auditory cue was made more useful for the task because it had greater predictive value and there were no competing spatial cues. Nonetheless, again, the auditory landmark was not encoded. Finally, in Experiment 3, after being prompted, participants were able to reorient by using the auditory landmark. Overall, participants failed to spontaneously rely on the auditory cue, regardless of how informative it was.

2020 ◽  
Vol 31 (01) ◽  
pp. 030-039 ◽  
Author(s):  
Aaron C. Moberly ◽  
Kara J. Vasil ◽  
Christin Ray

AbstractAdults with cochlear implants (CIs) are believed to rely more heavily on visual cues during speech recognition tasks than their normal-hearing peers. However, the relationship between auditory and visual reliance during audiovisual (AV) speech recognition is unclear and may depend on an individual’s auditory proficiency, duration of hearing loss (HL), age, and other factors.The primary purpose of this study was to examine whether visual reliance during AV speech recognition depends on auditory function for adult CI candidates (CICs) and adult experienced CI users (ECIs).Participants included 44 ECIs and 23 CICs. All participants were postlingually deafened and had met clinical candidacy requirements for cochlear implantation.Participants completed City University of New York sentence recognition testing. Three separate lists of twelve sentences each were presented: the first in the auditory-only (A-only) condition, the second in the visual-only (V-only) condition, and the third in combined AV fashion. Each participant’s amount of “visual enhancement” (VE) and “auditory enhancement” (AE) were computed (i.e., the benefit to AV speech recognition of adding visual or auditory information, respectively, relative to what could potentially be gained). The relative reliance of VE versus AE was also computed as a VE/AE ratio.VE/AE ratio was predicted inversely by A-only performance. Visual reliance was not significantly different between ECIs and CICs. Duration of HL and age did not account for additional variance in the VE/AE ratio.A shift toward visual reliance may be driven by poor auditory performance in ECIs and CICs. The restoration of auditory input through a CI does not necessarily facilitate a shift back toward auditory reliance. Findings suggest that individual listeners with HL may rely on both auditory and visual information during AV speech recognition, to varying degrees based on their own performance and experience, to optimize communication performance in real-world listening situations.


2017 ◽  
Vol 30 (7-8) ◽  
pp. 653-679 ◽  
Author(s):  
Nida Latif ◽  
Agnès Alsius ◽  
K. G. Munhall

During conversations, we engage in turn-taking behaviour that proceeds back and forth effortlessly as we communicate. In any given day, we participate in numerous face-to-face interactions that contain social cues from our partner and we interpret these cues to rapidly identify whether it is appropriate to speak. Although the benefit provided by visual cues has been well established in several areas of communication, the use of visual information to make turn-taking decisions during conversation is unclear. Here we conducted two experiments to investigate the role of visual information in identifying conversational turn exchanges. We presented clips containing single utterances spoken by single individuals engaged in a natural conversation with another. These utterances were from either right before a turn exchange (i.e., when the current talker would finish and the other would begin) or were utterances where the same talker would continue speaking. In Experiment 1, participants were presented audiovisual, auditory-only and visual-only versions of our stimuli and identified whether a turn exchange would occur or not. We demonstrated that although participants could identify turn exchanges with unimodal information alone, they performed best in the audiovisual modality. In Experiment 2, we presented participants audiovisual turn exchanges where the talker, the listener or both were visible. We showed that participants suffered a cost at identifying turns exchanges when visual cues from the listener were not available. Overall, we demonstrate that although auditory information is sufficient for successful conversation, visual information plays an important role in the overall efficiency of communication.


2003 ◽  
Vol 46 (6) ◽  
pp. 1367-1377 ◽  
Author(s):  
Allard Jongman ◽  
Yue Wang ◽  
Brian H. Kim

Most studies have been unable to identify reliable acoustic cues for the recognition of the English nonsibilant fricatives /f, v, θ, ð/. The present study was designed to test the extent to which the perception of these fricatives by normal-hearing adults is based on other sources of information, namely, linguistic context and visual information. In Experiment 1, target words beginning with /f/, /θ/, /s/, or /∫/ were preceded by either a semantically congruous or incongruous precursor sentence. Results showed an effect of linguistic context on the perception of the distinction between /f/ and /θ/ and on the acoustically more robust distinction between /s/ and /∫/. In Experiment 2, participants identified syllables consisting of the fricatives /f, v, θ, ð/ paired with the vowels /i, a, u/. Three conditions were contrasted: Stimuli were presented with (a) both auditory and visual information, (b) auditory information alone, or (c) visual information alone. When errors in terms of voicing were ignored in all 3 conditions, results indicated that perception of these fricatives is as good with visual information alone as with both auditory and visual information combined, and better than for auditory information alone. These findings suggest that accurate perception of nonsibilant fricatives derives from a combination of acoustic, linguistic, and visual information.


2007 ◽  
Vol 44 (5) ◽  
pp. 518-522 ◽  
Author(s):  
Shelley Von Berg ◽  
Douglas McColl ◽  
Tami Brancamp

Objective: This study investigated observers’ intelligibility for the spoken output of an individual with Moebius syndrome (MoS) with and without visual cues. Design: An audiovisual recording of the speaker's output was obtained for 50 Speech Intelligibility in Noise sentences consisting of 25 high predictability and 25 low predictability sentences. Stimuli were presented to observers under two conditions: audiovisual and audio only. Data were analyzed using a multivariate repeated measures model. Observers: Twenty students and faculty affiliated with the Department of Speech Pathology and Audiology at the University of Nevada, Reno. Results: ANOVA mixed design revealed that intelligibility for the audio condition only was significantly greater than intelligibility for the audiovisual condition; and accuracy for high predictability sentences was significantly greater than accuracy for low predictability sentences. Conclusions: The compensatory substitutional placements for phonemes produced by MoS speakers may detract from the intelligibility of speech. This is similar to the McGurk-MacDonald effect, whereby an illusory auditory signal is perceived when visual information from lip movements does not match the auditory information from speech. It also suggests that observers use contextual clues, more than the acoustic signal alone, to arrive at the accurate recognition of the message of the speakers with MoS. Therefore, speakers with MoS should be counseled in the top-down approach of auditory closure. When the speech signal is degraded, predictable messages are more easily understood than unpredictable ones. It is also important to confirm the speaking partner's understanding of the topic before proceeding.


Perception ◽  
1998 ◽  
Vol 27 (6) ◽  
pp. 737-754 ◽  
Author(s):  
Stephen Lakatos ◽  
Lawrence E Marks

To what extent can individuals accurately estimate the angle between two surfaces through touch alone, and how does tactile judgment compare to visual judgment? Subjects' ability to estimate angle size for a variety of haptic and visual stimuli was examined in a series of nine experiments. Triangular wooden blocks and raised contour outlines comprising different angles and radii of curvature at the apex were used in experiments 1 – 4 and it was found that subjects consistently underestimated angular extent relative to visual baselines and that the degree of underestimation was inversely related to the actual size of the angle. Angle estimates also increased with increasing radius of curvature when actual angle size was held constant. In contrast, experiments 5–8 showed that subjects did not underestimate angular extent when asked to perform a haptic – visual match to a computerized visual image; this outcome suggests that visual input may ‘recalibrate’ the haptic system's internal metric for estimating angle. The basis of this crossmodal interaction was investigated in experiment 9 by varying the nature and extent of visual cues available in haptic estimation tasks. The addition of visual-spatial cues did not significantly reduce the magnitude of haptic underestimation. The experiments as a whole indicate that haptic underestimations of angle occur in a number of different stimulus contexts, but leave open the question of exactly what type of visual information may serve to recalibrate touch in this regard.


2017 ◽  
Vol 114 (21) ◽  
pp. E4134-E4141 ◽  
Author(s):  
Andrew Chang ◽  
Steven R. Livingstone ◽  
Dan J. Bosnyak ◽  
Laurel J. Trainor

The cultural and technological achievements of the human species depend on complex social interactions. Nonverbal interpersonal coordination, or joint action, is a crucial element of social interaction, but the dynamics of nonverbal information flow among people are not well understood. We used joint music making in string quartets, a complex, naturalistic nonverbal behavior, as a model system. Using motion capture, we recorded body sway simultaneously in four musicians, which reflected real-time interpersonal information sharing. We used Granger causality to analyze predictive relationships among the motion time series of the players to determine the magnitude and direction of information flow among the players. We experimentally manipulated which musician was the leader (followers were not informed who was leading) and whether they could see each other, to investigate how these variables affect information flow. We found that assigned leaders exerted significantly greater influence on others and were less influenced by others compared with followers. This effect was present, whether or not they could see each other, but was enhanced with visual information, indicating that visual as well as auditory information is used in musical coordination. Importantly, performers’ ratings of the “goodness” of their performances were positively correlated with the overall degree of body sway coupling, indicating that communication through body sway reflects perceived performance success. These results confirm that information sharing in a nonverbal joint action task occurs through both auditory and visual cues and that the dynamics of information flow are affected by changing group relationships.


1975 ◽  
Vol 27 (2) ◽  
pp. 161-164 ◽  
Author(s):  
Graham Hitch ◽  
John Morton

The superiority of auditory over visual presentation in short-term serial recall may be due to the fact that typically only temporal cues to order have been provided in the two modalities. Auditory information is usually ordered along a temporal continuum, whereas visual information is ordered spatially, as well. It is therefore possible that recall following visual presentation may benefit from spatial cues to order. Subjects were tested for serial recall of letter-sequences presented visually either with or without explicit spatial cues to order. No effect of any kind was found, a result which suggests (a) that spatial information is not utilized when it is redundant with temporal information and (b) that the auditory-visual difference would not be modified by the presence of explicit spatial cues to order.


2021 ◽  
Author(s):  
Hye Yoon Seol ◽  
Soojin Kang ◽  
Ji Hyun Lim ◽  
Sung Hwa Hong ◽  
Il Joon Moon

UNSTRUCTURED It has been noted in literature that, there is a gap between clinical assessment and real-world performance. Real-world conversations entail visual and information and yet there are not any audiological assessment tools that include visual information. Virtual reality (VR) technology has been applied to various areas including audiology. However, the use of VR in speech in noise perception has not been investigated yet. The purpose of this study is to investigate the impact of virtual space on speech performance and its feasibility to be used as a speech test instrument. Thirty individuals with normal hearing and twenty-five individuals with hearing loss completed puretone audiometry and the Korean version of the Hearing in Noise Test (K-HINT) in conventional K-HINT, VS on PC, and VS on head-mounted display at -10, -5, 0 and +5dB signal-to-noise ratios. Participants listened to target speech and repeated back to the tester for all conditions. Hearing aid users in the hearing loss group completed testing in unaided and aided conditions. A questionnaire was administered after testing. Provision of visual information had a significant impact on speech performance between the normal hearing and hearing impairment groups. Hearing aid use led to better integration of audio and visual cues. Statistical significance was observed for some conditions in each group and between hearing aid and non-hearing aid users. Participants reported positive responses across almost all items on the questionnaire except for the weight of the headset. Participants preferred a test method with visual imagery, but the headset was heavy. Findings are in line with previous literature that visual cues are beneficial for communication. This is the first study to include hearing aid users with a more naturalistic stimulus and a relatively “simple” test environment, suggesting the feasibility of virtual reality audiological testing in clinical practice.


2021 ◽  
Author(s):  
Tobias Gerstenberg ◽  
Max H Siegel ◽  
Joshua Tenenbaum

We introduce a novel experimental paradigm for studying multi-modal integration in causal inference. Our experiments feature a physically realistic Plinko machine in which a ball is dropped through one of three holes and comes to rest at the bottom after colliding with a number of obstacles. We develop a hypothetical simulation model which postulates that people figure out what happened by integrating visual and auditory evidence through mental simulation. We test the model in a series of three experiments. In Experiment 1, participants only receive visual information and either predict where the ball will land, or infer in what hole it was dropped based on where it landed. In Experiment 2, participants receive both visual and auditory information - they hear what sounds the dropped ball makes. We find that participants are capable of integrating both sources of information, and that the sounds help them figure out what happened. In Experiment 3, we show strong cue integration: even when vision and sound are individually completely non-diagnostic, participants succeed by combining both sources of evidence.


2019 ◽  
Vol 63 (2) ◽  
pp. 264-291 ◽  
Author(s):  
Irene de la Cruz-Pavía ◽  
Janet F. Werker ◽  
Eric Vatikiotis-Bateson ◽  
Judit Gervain

The audiovisual speech signal contains multimodal information to phrase boundaries. In three artificial language learning studies with 12 groups of adult participants we investigated whether English monolinguals and bilingual speakers of English and a language with opposite basic word order (i.e., in which objects precede verbs) can use word frequency, phrasal prosody and co-speech (facial) visual information, namely head nods, to parse unknown languages into phrase-like units. We showed that monolinguals and bilinguals used the auditory and visual sources of information to chunk “phrases” from the input. These results suggest that speech segmentation is a bimodal process, though the influence of co-speech facial gestures is rather limited and linked to the presence of auditory prosody. Importantly, a pragmatic factor, namely the language of the context, seems to determine the bilinguals’ segmentation, overriding the auditory and visual cues and revealing a factor that begs further exploration.


Sign in / Sign up

Export Citation Format

Share Document