scholarly journals A speech envelope landmark for syllable encoding in human superior temporal gyrus

2019 ◽  
Vol 5 (11) ◽  
pp. eaay6279 ◽  
Author(s):  
Yulia Oganian ◽  
Edward F. Chang

The most salient acoustic features in speech are the modulations in its intensity, captured by the amplitude envelope. Perceptually, the envelope is necessary for speech comprehension. Yet, the neural computations that represent the envelope and their linguistic implications are heavily debated. We used high-density intracranial recordings, while participants listened to speech, to determine how the envelope is represented in human speech cortical areas on the superior temporal gyrus (STG). We found that a well-defined zone in middle STG detects acoustic onset edges (local maxima in the envelope rate of change). Acoustic analyses demonstrated that timing of acoustic onset edges cues syllabic nucleus onsets, while their slope cues syllabic stress. Synthesized amplitude-modulated tone stimuli showed that steeper slopes elicited greater responses, confirming cortical encoding of amplitude change, not absolute amplitude. Overall, STG encoding of the timing and magnitude of acoustic onset edges underlies the perception of speech temporal structure.

2018 ◽  
Author(s):  
Yulia Oganian ◽  
Edward F. Chang

AbstractListeners use the slow amplitude modulations of speech, known as the envelope, to segment continuous speech into syllables. However, the underlying neural computations are heavily debated. We used high-density intracranial cortical recordings while participants listened to natural and synthesized control speech stimuli to determine how the envelope is represented in the human superior temporal gyrus (STG), a critical auditory brain area for speech processing. We found that the STG does not encode the instantaneous, moment-by-moment amplitude envelope of speech. Rather, a zone of the middle STG detects discrete acoustic onset edges, defined by local maxima in the rate-of-change of the envelope. Acoustic analysis demonstrated that acoustic onset edges reliably cue the information-rich transition between the consonant-onset and vowel-nucleus of syllables. Furthermore, the steepness of the acoustic edge cued whether a syllable was stressed. Synthesized amplitude-modulated tone stimuli showed that steeper edges elicited monotonically greater cortical responses, confirming the encoding of relative but not absolute amplitude. Overall, encoding of the timing and magnitude of acoustic onset edges in STG underlies our perception of the syllabic rhythm of speech.


2019 ◽  
Author(s):  
Sam V Norman-Haignere ◽  
Jenelle Feather ◽  
Dana Boebinger ◽  
Peter Brunner ◽  
Anthony Ritaccio ◽  
...  

AbstractHow are neural representations of music organized in the human brain? While neuroimaging has suggested some segregation between responses to music and other sounds, it remains unclear whether finer-grained organization exists within the domain of music. To address this question, we measured cortical responses to natural sounds using intracranial recordings from human patients and inferred canonical response components using a data-driven decomposition algorithm. The inferred components replicated many prior findings including distinct neural selectivity for speech and music. Our key novel finding is that one component responded nearly exclusively to music with singing. Song selectivity was not explainable by standard acoustic features and was co-located with speech- and music-selective responses in the middle and anterior superior temporal gyrus. These results suggest that neural representations of music are fractionated into subpopulations selective for different types of music, at least one of which is specialized for the analysis of song.


Author(s):  
Sam V Norman-Haignere ◽  
Laura K. Long ◽  
Orrin Devinsky ◽  
Werner Doyle ◽  
Ifeoma Irobunda ◽  
...  

AbstractTo derive meaning from sound, the brain must integrate information across tens (e.g. phonemes) to hundreds (e.g. words) of milliseconds, but the neural computations that enable multiscale integration remain unclear. Prior evidence suggests that human auditory cortex analyzes sound using both generic acoustic features (e.g. spectrotemporal modulation) and category-specific computations, but how these putatively distinct computations integrate temporal information is unknown. To answer this question, we developed a novel method to estimate neural integration periods and applied the method to intracranial recordings from human epilepsy patients. We show that integration periods increase three-fold as one ascends the auditory cortical hierarchy. Moreover, we find that electrodes with short integration periods (~50-150 ms) respond selectively to spectrotemporal modulations, while electrodes with long integration periods (~200-300 ms) show prominent selectivity for sound categories such as speech and music. These findings reveal how multiscale temporal analysis organizes hierarchical computation in human auditory cortex.


Brain ◽  
2009 ◽  
Vol 132 (12) ◽  
pp. 3401-3410 ◽  
Author(s):  
Alexander P. Leff ◽  
Thomas M. Schofield ◽  
Jennifer T. Crinion ◽  
Mohamed L. Seghier ◽  
Alice Grogan ◽  
...  

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Maya Inbar ◽  
Eitan Grossman ◽  
Ayelet N. Landau

Abstract Studies of speech processing investigate the relationship between temporal structure in speech stimuli and neural activity. Despite clear evidence that the brain tracks speech at low frequencies (~ 1 Hz), it is not well understood what linguistic information gives rise to this rhythm. In this study, we harness linguistic theory to draw attention to Intonation Units (IUs), a fundamental prosodic unit of human language, and characterize their temporal structure as captured in the speech envelope, an acoustic representation relevant to the neural processing of speech. IUs are defined by a specific pattern of syllable delivery, together with resets in pitch and articulatory force. Linguistic studies of spontaneous speech indicate that this prosodic segmentation paces new information in language use across diverse languages. Therefore, IUs provide a universal structural cue for the cognitive dynamics of speech production and comprehension. We study the relation between IUs and periodicities in the speech envelope, applying methods from investigations of neural synchronization. Our sample includes recordings from every-day speech contexts of over 100 speakers and six languages. We find that sequences of IUs form a consistent low-frequency rhythm and constitute a significant periodic cue within the speech envelope. Our findings allow to predict that IUs are utilized by the neural system when tracking speech. The methods we introduce here facilitate testing this prediction in the future (i.e., with physiological data).


2012 ◽  
Vol 24 (2) ◽  
pp. 340-352 ◽  
Author(s):  
K. V. Nourski ◽  
M. Steinschneider ◽  
H. Oya ◽  
H. Kawasaki ◽  
R. D. Jones ◽  
...  

2019 ◽  
Author(s):  
Sankar Mukherjee ◽  
Alice Tomassini ◽  
Leonardo Badino ◽  
Aldo Pastore ◽  
Luciano Fadiga ◽  
...  

AbstractCortical entrainment to the (quasi-) rhythmic components of speech seems to play an important role in speech comprehension. It has been suggested that neural entrainment may reflect top-down temporal predictions of sensory signals. Key properties of a predictive model are its anticipatory nature and its ability to reconstruct missing information. Here we put both these two properties to experimental test. We acoustically presented sentences and measured cortical entrainment to both acoustic speech envelope and lips kinematics acquired from the speaker but not visible to the participants. We then analyzed speech-brain and lips-brain coherence at multiple negative and positive lags. Besides the well-known cortical entrainment to the acoustic speech envelope, we found significant entrainment in the delta range to the (latent) lips kinematics. Most interestingly, the two entrainment phenomena were temporally dissociated. While entrainment to the acoustic speech peaked around +0.3 s lag (i.e., when EEG followed speech by 0.3 s), entrainment to the lips was significantly anticipated and peaked around 0-0.1 s lag (i.e., when EEG was virtually synchronous to the putative lips movement). Our results demonstrate that neural entrainment during speech listening involves the anticipatory reconstruction of missing information related to lips movement production, indicating its fundamentally predictive nature and thus supporting analysis by synthesis models.


2017 ◽  
Author(s):  
Thomas Pfeffer ◽  
Arthur-Ervin Avramiea ◽  
Guido Nolte ◽  
Andreas K. Engel ◽  
Klaus Linkenkaer-Hansen ◽  
...  

ABSTRACTThe ascending modulatory systems of the brainstem are powerful regulators of global brain state. Disturbances of these systems are implicated in several major neuropsychiatric disorders. Yet, how these systems interact with specific neural computations in the cerebral cortex to shape perception, cognition, and behavior remains poorly understood. Here, we probed into the effect of two such systems, the catecholaminergic (dopaminergic and noradrenergic) and cholinergic systems, on an important aspect of cortical computation: its intrinsic variability. To this end, we combined placebo-controlled pharmacological intervention in humans, magnetoencephalographic (MEG) recordings of cortical population activity, and psychophysical measurements of the perception of ambiguous visual input. A low-dose catecholaminergic, but not cholinergic, manipulation altered the rate of spontaneous perceptual fluctuations as well as the temporal structure of “scale-free” population activity of large swaths of visual and parietal cortex. Computational analyses indicate that both effects were consistent with an increase in excitatory relative to inhibitory activity in the cortical areas underlying visual perceptual inference. We propose that catecholamines regulate the variability of perception and cognition through dynamically changing the cortical excitation-inhibition ratio. The combined read-out of fluctuations in perception and cortical activity we established here may prove useful as an efficient, and easily accessible marker of altered cortical computation in neuropsychiatric disorders.


2020 ◽  
Author(s):  
Yingcan Carol Wang ◽  
Ediz Sohoglu ◽  
Rebecca A. Gilbert ◽  
Richard N. Henson ◽  
Matthew H. Davis

AbstractHuman listeners achieve quick and effortless speech comprehension through computations of conditional probability using Bayes rule. However, the neural implementation of Bayesian perceptual inference remains unclear. Competitive-selection accounts (e.g. TRACE) propose that word recognition is achieved through direct inhibitory connections between units representing candidate words that share segments (e.g. hygiene and hijack share /haid3/). Manipulations that increase lexical uncertainty should increase neural responses associated with word recognition when words cannot be uniquely identified (during the first syllable). In contrast, predictive-selection accounts (e.g. Predictive-Coding) proposes that spoken word recognition involves comparing heard and predicted speech sounds and using prediction error to update lexical representations. Increased lexical uncertainty in words like hygiene and hijack will increase prediction error and hence neural activity only at later time points when different segments are predicted (during the second syllable). We collected MEG data to distinguish these two mechanisms and used a competitor priming manipulation to change the prior probability of specific words. Lexical decision responses showed delayed recognition of target words (hygiene) following presentation of a neighbouring prime word (hijack) several minutes earlier. However, this effect was not observed with pseudoword primes (higent) or targets (hijure). Crucially, MEG responses in the STG showed greater neural responses for word-primed words after the point at which they were uniquely identified (after /haid3/ in hygiene) but not before while similar changes were again absent for pseudowords. These findings are consistent with accounts of spoken word recognition in which neural computations of prediction error play a central role.Significance StatementEffective speech perception is critical to daily life and involves computations that combine speech signals with prior knowledge of spoken words; that is, Bayesian perceptual inference. This study specifies the neural mechanisms that support spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Most established theories propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words. Our results instead support predictive-selection theories (e.g. Predictive-Coding): by comparing heard and predicted speech sounds, neural computations of prediction error can help listeners continuously update lexical probabilities, allowing for more rapid word identification.


2008 ◽  
Vol 99 (1) ◽  
pp. 87-95 ◽  
Author(s):  
Brian E. Russ ◽  
Ashlee L. Ackelson ◽  
Allison E. Baker ◽  
Yale E. Cohen

The neural computations that underlie the processing of auditory-stimulus identity are not well understood, especially how information is transformed across different cortical areas. Here, we compared the capacity of neurons in the superior temporal gyrus (STG) and the ventrolateral prefrontal cortex (vPFC) to code the identity of an auditory stimulus; these two areas are part of a ventral processing stream for auditory-stimulus identity. Whereas the responses of neurons in both areas are reliably modulated by different vocalizations, STG responses code significantly more vocalizations than those in the vPFC. Together, these data indicate that the STG and vPFC differentially code auditory identity, which suggests that substantial information processing takes place between these two areas. These findings are consistent with the hypothesis that the STG and the vPFC are part of a functional circuit for auditory-identity analysis.


Sign in / Sign up

Export Citation Format

Share Document