scholarly journals Auditory predictions and prediction errors in response to self-initiated vowels

2019 ◽  
Author(s):  
Franziska Knolle ◽  
Michael Schwartze ◽  
Erich Schröger ◽  
Sonja A. Kotz

AbstractIt has been suggested that speech production is accomplished by an internal forward model, reducing processing activity directed to self-produced speech in the auditory cortex. The current study uses an established N1-suppression paradigm comparing self- and externally-initiated natural speech sounds to answer two questions:Are forward predictions generated to process complex speech sounds, such as vowels, initiated via a button press?Are prediction errors regarding self-initiated deviant vowels reflected in the corresponding ERP components?Results confirm an N1-suppression in response to self-initiated speech sounds. Furthermore, our results suggest that predictions leading to the N1-suppression effect are specific, as self-initiated deviant vowels do not elicit an N1-suppression effect. Rather, self-initiated deviant vowels elicit an enhanced N2b and P3a compared to externally-generated deviants, externally-generated standard, or self-initiated standards, again confirming prediction specificity.Results show that prediction errors are salient in self-initiated auditory speech sounds, which may lead to more efficient error correction in speech production.

1977 ◽  
Vol 45 (1) ◽  
pp. 123-129 ◽  
Author(s):  
Donald Fucci ◽  
Michael A. Crary ◽  
Joseph A. Warren ◽  
Z. S. Bond

To investigate the interaction between the auditory and oral sensory feedback modalities during speech production lingual vibrotactile thresholds were obtained from subjects in the following conditions: (1) before and after speech production with normal auditory feedback, (2) before and after speech production under exposure to auditory masking, and (3) before and after exposure to auditory masking without performing speech tasks. In addition duration measurements were obtained for selected speech sounds to investigate temporal changes in the articulatory patterns of subjects in the various conditions. Lingual sensory decreases and temporal reorganization were observed only in subjects speaking under auditory masking. These data suggest a balanced interaction between auditory and oral sensory feedback modalities which, when disturbed, results in non-phonemic change in speech production.


2020 ◽  
Author(s):  
Johannes Rennig ◽  
Michael S Beauchamp

AbstractRegions of the human posterior superior temporal gyrus and sulcus (pSTG/S) respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech. We hypothesized that these multisensory responses in pSTG/S underlie the observation that comprehension of noisy auditory speech is improved when it is accompanied by visual speech. To test this idea, we presented audiovisual sentences that contained either a clear auditory component or a noisy auditory component while measuring brain activity using BOLD fMRI. Participants reported the intelligibility of the speech on each trial with a button press. Perceptually, adding visual speech to noisy auditory sentences rendered them much more intelligible. Post-hoc trial sorting was used to examine brain activations during noisy sentences that were more or less intelligible, focusing on multisensory speech regions in the pSTG/S identified with an independent visual speech localizer. Univariate analysis showed that less intelligible noisy audiovisual sentences evoked a weaker BOLD response, while more intelligible sentences evoked a stronger BOLD response that was indistinguishable from clear sentences. To better understand these differences, we conducted a multivariate representational similarity analysis. The pattern of response for intelligible noisy audiovisual sentences was more similar to the pattern for clear sentences, while the response pattern for unintelligible noisy sentences was less similar. These results show that for both univariate and multivariate analyses, successful integration of visual and noisy auditory speech normalizes responses in pSTG/S, providing evidence that multisensory subregions of pSTG/S are responsible for the perceptual benefit of visual speech.Significance StatementEnabling social interactions, including the production and perception of speech, is a key function of the human brain. Speech perception is a complex computational problem that the brain solves using both visual information from the talker’s facial movements and auditory information from the talker’s voice. Visual speech information is particularly important under noisy listening conditions when auditory speech is difficult or impossible to understand alone Regions of the human cortex in posterior superior temporal lobe respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech. We show that the pattern of activity in cortex reflects the successful multisensory integration of auditory and visual speech information in the service of perception.


2020 ◽  
Author(s):  
Arjen Alink ◽  
Helen Blank

AbstractThe expectation-suppression effect – reduced stimulus-evoked responses to expected stimuli – is widely considered to be an empirical hallmark of reduced prediction errors in the framework of predictive coding. Here we challenge this notion by proposing that this phenomenon can also be explained by a reduced attention effect. Specifically, we argue that reduced responses to predictable stimuli can also be explained by a reduced saliency-driven allocation of attention. To resolve whether expectation suppression is best explained by attention or predictive coding, additional research is needed to determine whether attention effects precede the encoding of expectation violations (or vice versa) and to reveal how expectations change neural representations of stimulus features.


Phonology ◽  
1998 ◽  
Vol 15 (2) ◽  
pp. 143-188 ◽  
Author(s):  
Grzegorz Dogil ◽  
Jörg Mayer

The present study proposes a new interpretation of the underlying distortion in APRAXIA OF SPEECH. Apraxia of speech, in its pure form, is the only neurolinguistic syndrome for which it can be argued that phonological structure is selectively distorted.Apraxia of speech is a nosological entity in its own right which co-occurs with aphasia only occasionally. This…conviction rests on detailed descriptions of patients who have a severe and lasting disorder of speech production in the absence of any significant impairment of speech comprehension, reading or writing as well as of any significant paralysis or weakness of the speech musculature.(Lebrun 1990: 380)Based on the experimental investigation of poorly coarticulated speech of patients from two divergent languages (German and Xhosa) it is argued that apraxia of speech has to be seen as a defective implementation of phonological representations at the phonology–phonetics interface. We contend that phonological structure exhibits neither a homogeneously auditory pattern nor a motor pattern, but a complex encoding of sequences of speech sounds. Specifically, it is maintained that speech is encoded in the brain as a sequence of distinctive feature configurations. These configurations are specified with differing degrees of detail depending on the role the speech segments they underlie play in the phonological structure of a language. The transfer between phonological and phonetic representation encodes speech sounds as a sequence of vocal tract configurations. Like the distinctive feature representation, these configurations may be more or less specified. We argue that the severe and lasting disorders in speech production observed in apraxia of speech are caused by the distortion of this transfer between phonological and phonetic representation. The characteristic production deficits of apraxic patients are explained in terms of overspecification of phonetic representations.


1994 ◽  
Vol 37 (1) ◽  
pp. 4-27 ◽  
Author(s):  
Vincent L. Gracco

The neuromotor organization for a class of speech sounds (bilabials) was examined to evaluate the control principles underlying speech as a sensorimotor process. Oral opening and closing actions for the consonants /p/, /b/, and /m/ (C1) in /s V1 C1 V2 C2/ context, where V1 was either /ae/ or /i/, V2 was /ae/, and C2 was /p/, were analyzed from 4 subjects. The timing of oral opening and closing action was found to be a significant variable differentiating bilabial consonants. Additionally, opening and closing actions were found to covary along a number of dimensions implicating the movement cycle as the minimal unit of speech motor programming. The sequential adjustments of the lips and jaw varied systematically with phonetic context reflecting the different functional roles of these articulators in the production of consonants and vowels. The implication of these findings for speech production is discussed.


Author(s):  
Linda Polka ◽  
Matthew Masapollo ◽  
Lucie Ménard

Purpose: Current models of speech development argue for an early link between speech production and perception in infants. Recent data show that young infants (at 4–6 months) preferentially attend to speech sounds (vowels) with infant vocal properties compared to those with adult vocal properties, suggesting the presence of special “memory banks” for one's own nascent speech-like productions. This study investigated whether the vocal resonances (formants) of the infant vocal tract are sufficient to elicit this preference and whether this perceptual bias changes with age and emerging vocal production skills. Method: We selectively manipulated the fundamental frequency ( f 0 ) of vowels synthesized with formants specifying either an infant or adult vocal tract, and then tested the effects of those manipulations on the listening preferences of infants who were slightly older than those previously tested (at 6–8 months). Results: Unlike findings with younger infants (at 4–6 months), slightly older infants in Experiment 1 displayed a robust preference for vowels with infant formants over adult formants when f 0 was matched. The strength of this preference was also positively correlated with age among infants between 4 and 8 months. In Experiment 2, this preference favoring infant over adult formants was maintained when f 0 values were modulated. Conclusions: Infants between 6 and 8 months of age displayed a robust and distinct preference for speech with resonances specifying a vocal tract that is similar in size and length to their own. This finding, together with data indicating that this preference is not present in younger infants and appears to increase with age, suggests that nascent knowledge of the motor schema of the vocal tract may play a role in shaping this perceptual bias, lending support to current models of speech development. Supplemental Material https://doi.org/10.23641/asha.17131805


2012 ◽  
Vol 24 (9) ◽  
pp. 1919-1931 ◽  
Author(s):  
János Horváth ◽  
Burkhard Maess ◽  
Pamela Baess ◽  
Annamária Tóth

The N1 auditory ERP and its magnetic counterpart (N1[m]) are suppressed when elicited by self-induced sounds. Because the N1(m) is a correlate of auditory event detection, this N1 suppression effect is generally interpreted as a reflection of the workings of an internal forward model: The forward model captures the contingency (causal relationship) between the action and the sound, and this is used to cancel the predictable sensory reafference when the action is initiated. In this study, we demonstrated in three experiments using a novel coincidence paradigm that actual contingency between actions and sounds is not a necessary condition for N1 suppression. Participants performed time interval production tasks: They pressed a key to set the boundaries of time intervals. Concurrently, but independently of keypresses, a sequence of pure tones with random onset-to-onset intervals was presented. Tones coinciding with keypresses elicited suppressed N1(m) and P2(m), suggesting that action–stimulus contiguity (temporal proximity) is sufficient to suppress sensory processing related to the detection of auditory events.


2017 ◽  
Vol 29 (2) ◽  
pp. 298-309 ◽  
Author(s):  
Ima Trempler ◽  
Anne-Marike Schiffer ◽  
Nadiya El-Sourani ◽  
Christiane Ahlheim ◽  
Gereon R. Fink ◽  
...  

Surprising events may be relevant or irrelevant for behavior, requiring either flexible adjustment or stabilization of our model of the world and according response strategies. Cognitive flexibility and stability in response to environmental demands have been described as separable cognitive states, associated with activity of striatal and lateral prefrontal regions, respectively. It so far remains unclear, however, whether these two states act in an antagonistic fashion and which neural mechanisms mediate the selection of respective responses, on the one hand, and a transition between these states, on the other. In this study, we tested whether the functional dichotomy between striatal and prefrontal activity applies for the separate functions of updating (in response to changes in the environment, i.e., switches) and shielding (in response to chance occurrences of events violating expectations, i.e., drifts) of current predictions. We measured brain activity using fMRI while 20 healthy participants performed a task that required to serially predict upcoming items. Switches between predictable sequences had to be indicated via button press while sequence omissions (drifts) had to be ignored. We further varied the probability of switches and drifts to assess the neural network supporting the transition between flexible and stable cognitive states as a function of recent performance history in response to environmental demands. Flexible switching between models was associated with activation in medial pFC (BA 9 and BA 10), whereas stable maintenance of the internal model corresponded to activation in the lateral pFC (BA 6 and inferior frontal gyrus). Our findings extend previous studies on the interplay of flexibility and stability, suggesting that different prefrontal regions are activated by different types of prediction errors, dependent on their behavioral requirements. Furthermore, we found that striatal activation in response to switches and drifts was modulated by participants' successful behavior toward these events, suggesting the striatum to be responsible for response selections following unpredicted stimuli. Finally, we observed that the dopaminergic midbrain modulates the transition between different cognitive states, thresholded by participants' individual performance history in response to temporal environmental demands.


2012 ◽  
Vol 107 (1) ◽  
pp. 442-447 ◽  
Author(s):  
Takayuki Ito ◽  
David J. Ostry

Interactions between auditory and somatosensory information are relevant to the neural processing of speech since speech processes and certainly speech production involves both auditory information and inputs that arise from the muscles and tissues of the vocal tract. We previously demonstrated that somatosensory inputs associated with facial skin deformation alter the perceptual processing of speech sounds. We show here that the reverse is also true, that speech sounds alter the perception of facial somatosensory inputs. As a somatosensory task, we used a robotic device to create patterns of facial skin deformation that would normally accompany speech production. We found that the perception of the facial skin deformation was altered by speech sounds in a manner that reflects the way in which auditory and somatosensory effects are linked in speech production. The modulation of orofacial somatosensory processing by auditory inputs was specific to speech and likewise to facial skin deformation. Somatosensory judgments were not affected when the skin deformation was delivered to the forearm or palm or when the facial skin deformation accompanied nonspeech sounds. The perceptual modulation that we observed in conjunction with speech sounds shows that speech sounds specifically affect neural processing in the facial somatosensory system and suggest the involvement of the somatosensory system in both the production and perceptual processing of speech.


Sign in / Sign up

Export Citation Format

Share Document