A speech envelope landmark for syllable encoding in human superior temporal gyrus

Yulia Oganian; Edward F. Chang

doi:10.1126/sciadv.aay6279

A speech envelope landmark for syllable encoding in human superior temporal gyrus

Science Advances ◽

10.1126/sciadv.aay6279 ◽

2019 ◽

Vol 5 (11) ◽

pp. eaay6279 ◽

Cited By ~ 8

Author(s):

Yulia Oganian ◽

Edward F. Chang

Keyword(s):

Temporal Structure ◽

Superior Temporal Gyrus ◽

Rate Of Change ◽

Speech Comprehension ◽

Acoustic Features ◽

Local Maxima ◽

Neural Computations ◽

Absolute Amplitude ◽

Speech Envelope ◽

Intracranial Recordings

The most salient acoustic features in speech are the modulations in its intensity, captured by the amplitude envelope. Perceptually, the envelope is necessary for speech comprehension. Yet, the neural computations that represent the envelope and their linguistic implications are heavily debated. We used high-density intracranial recordings, while participants listened to speech, to determine how the envelope is represented in human speech cortical areas on the superior temporal gyrus (STG). We found that a well-defined zone in middle STG detects acoustic onset edges (local maxima in the envelope rate of change). Acoustic analyses demonstrated that timing of acoustic onset edges cues syllabic nucleus onsets, while their slope cues syllabic stress. Synthesized amplitude-modulated tone stimuli showed that steeper slopes elicited greater responses, confirming cortical encoding of amplitude change, not absolute amplitude. Overall, STG encoding of the timing and magnitude of acoustic onset edges underlies the perception of speech temporal structure.

Download Full-text

A speech envelope landmark for syllable encoding in human superior temporal gyrus

10.1101/388280 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yulia Oganian ◽

Edward F. Chang

Keyword(s):

Speech Processing ◽

Acoustic Analysis ◽

Brain Area ◽

Superior Temporal Gyrus ◽

Rate Of Change ◽

Local Maxima ◽

Speech Stimuli ◽

Neural Computations ◽

Absolute Amplitude ◽

Speech Envelope

AbstractListeners use the slow amplitude modulations of speech, known as the envelope, to segment continuous speech into syllables. However, the underlying neural computations are heavily debated. We used high-density intracranial cortical recordings while participants listened to natural and synthesized control speech stimuli to determine how the envelope is represented in the human superior temporal gyrus (STG), a critical auditory brain area for speech processing. We found that the STG does not encode the instantaneous, moment-by-moment amplitude envelope of speech. Rather, a zone of the middle STG detects discrete acoustic onset edges, defined by local maxima in the rate-of-change of the envelope. Acoustic analysis demonstrated that acoustic onset edges reliably cue the information-rich transition between the consonant-onset and vowel-nucleus of syllables. Furthermore, the steepness of the acoustic edge cued whether a syllable was stressed. Synthesized amplitude-modulated tone stimuli showed that steeper edges elicited monotonically greater cortical responses, confirming the encoding of relative but not absolute amplitude. Overall, encoding of the timing and magnitude of acoustic onset edges in STG underlies our perception of the syllabic rhythm of speech.

Download Full-text

Intracranial recordings from human auditory cortex reveal a neural population selective for song

10.1101/696161 ◽

2019 ◽

Cited By ~ 3

Author(s):

Sam V Norman-Haignere ◽

Jenelle Feather ◽

Dana Boebinger ◽

Peter Brunner ◽

Anthony Ritaccio ◽

...

Keyword(s):

Human Brain ◽

Superior Temporal Gyrus ◽

Data Driven ◽

Neural Population ◽

Acoustic Features ◽

Neural Representations ◽

Cortical Responses ◽

Different Types ◽

Neural Selectivity ◽

Intracranial Recordings

AbstractHow are neural representations of music organized in the human brain? While neuroimaging has suggested some segregation between responses to music and other sounds, it remains unclear whether finer-grained organization exists within the domain of music. To address this question, we measured cortical responses to natural sounds using intracranial recordings from human patients and inferred canonical response components using a data-driven decomposition algorithm. The inferred components replicated many prior findings including distinct neural selectivity for speech and music. Our key novel finding is that one component responded nearly exclusively to music with singing. Song selectivity was not explainable by standard acoustic features and was co-located with speech- and music-selective responses in the middle and anterior superior temporal gyrus. These results suggest that neural representations of music are fractionated into subpopulations selective for different types of music, at least one of which is specialized for the analysis of song.

Download Full-text

Multiscale integration organizes hierarchical computation in human auditory cortex

10.1101/2020.09.30.321687 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sam V Norman-Haignere ◽

Laura K. Long ◽

Orrin Devinsky ◽

Werner Doyle ◽

Ifeoma Irobunda ◽

...

Keyword(s):

Auditory Cortex ◽

Temporal Analysis ◽

Acoustic Features ◽

Human Epilepsy ◽

Neural Integration ◽

Neural Computations ◽

Novel Method ◽

Human Auditory Cortex ◽

Hierarchical Computation ◽

Intracranial Recordings

AbstractTo derive meaning from sound, the brain must integrate information across tens (e.g. phonemes) to hundreds (e.g. words) of milliseconds, but the neural computations that enable multiscale integration remain unclear. Prior evidence suggests that human auditory cortex analyzes sound using both generic acoustic features (e.g. spectrotemporal modulation) and category-specific computations, but how these putatively distinct computations integrate temporal information is unknown. To answer this question, we developed a novel method to estimate neural integration periods and applied the method to intracranial recordings from human epilepsy patients. We show that integration periods increase three-fold as one ascends the auditory cortical hierarchy. Moreover, we find that electrodes with short integration periods (~50-150 ms) respond selectively to spectrotemporal modulations, while electrodes with long integration periods (~200-300 ms) show prominent selectivity for sound categories such as speech and music. These findings reveal how multiscale temporal analysis organizes hierarchical computation in human auditory cortex.

Download Full-text

The left superior temporal gyrus is a shared substrate for auditory short-term memory and speech comprehension: evidence from 210 patients with stroke

Brain ◽

10.1093/brain/awp273 ◽

2009 ◽

Vol 132 (12) ◽

pp. 3401-3410 ◽

Cited By ~ 159

Author(s):

Alexander P. Leff ◽

Thomas M. Schofield ◽

Jennifer T. Crinion ◽

Mohamed L. Seghier ◽

Alice Grogan ◽

...

Keyword(s):

Short Term Memory ◽

Superior Temporal Gyrus ◽

Speech Comprehension ◽

Short Term ◽

Term Memory

Download Full-text

Sequences of Intonation Units form a ~ 1 Hz rhythm

Scientific Reports ◽

10.1038/s41598-020-72739-4 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Maya Inbar ◽

Eitan Grossman ◽

Ayelet N. Landau

Keyword(s):

Speech Processing ◽

Temporal Structure ◽

Low Frequency ◽

Neural System ◽

Specific Pattern ◽

Physiological Data ◽

Low Frequencies ◽

Speech Stimuli ◽

Intonation Units ◽

Speech Envelope

Abstract Studies of speech processing investigate the relationship between temporal structure in speech stimuli and neural activity. Despite clear evidence that the brain tracks speech at low frequencies (~ 1 Hz), it is not well understood what linguistic information gives rise to this rhythm. In this study, we harness linguistic theory to draw attention to Intonation Units (IUs), a fundamental prosodic unit of human language, and characterize their temporal structure as captured in the speech envelope, an acoustic representation relevant to the neural processing of speech. IUs are defined by a specific pattern of syllable delivery, together with resets in pitch and articulatory force. Linguistic studies of spontaneous speech indicate that this prosodic segmentation paces new information in language use across diverse languages. Therefore, IUs provide a universal structural cue for the cognitive dynamics of speech production and comprehension. We study the relation between IUs and periodicities in the speech envelope, applying methods from investigations of neural synchronization. Our sample includes recordings from every-day speech contexts of over 100 speakers and six languages. We find that sequences of IUs form a consistent low-frequency rhythm and constitute a significant periodic cue within the speech envelope. Our findings allow to predict that IUs are utilized by the neural system when tracking speech. The methods we introduce here facilitate testing this prediction in the future (i.e., with physiological data).

Download Full-text

Spectral Organization of the Human Lateral Superior Temporal Gyrus Revealed by Intracranial Recordings

Cerebral Cortex ◽

10.1093/cercor/bhs314 ◽

2012 ◽

Vol 24 (2) ◽

pp. 340-352 ◽

Cited By ~ 24

Author(s):

K. V. Nourski ◽

M. Steinschneider ◽

H. Oya ◽

H. Kawasaki ◽

R. D. Jones ◽

...

Keyword(s):

Superior Temporal Gyrus ◽

Intracranial Recordings

Download Full-text

Cortical tracking of speech reveals top-down reconstructive processes

10.1101/536946 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sankar Mukherjee ◽

Alice Tomassini ◽

Leonardo Badino ◽

Aldo Pastore ◽

Luciano Fadiga ◽

...

Keyword(s):

Predictive Model ◽

Experimental Test ◽

Speech Comprehension ◽

Missing Information ◽

Top Down ◽

Sensory Signals ◽

Analysis By Synthesis ◽

Neural Entrainment ◽

Speech Envelope ◽

Cortical Entrainment

AbstractCortical entrainment to the (quasi-) rhythmic components of speech seems to play an important role in speech comprehension. It has been suggested that neural entrainment may reflect top-down temporal predictions of sensory signals. Key properties of a predictive model are its anticipatory nature and its ability to reconstruct missing information. Here we put both these two properties to experimental test. We acoustically presented sentences and measured cortical entrainment to both acoustic speech envelope and lips kinematics acquired from the speaker but not visible to the participants. We then analyzed speech-brain and lips-brain coherence at multiple negative and positive lags. Besides the well-known cortical entrainment to the acoustic speech envelope, we found significant entrainment in the delta range to the (latent) lips kinematics. Most interestingly, the two entrainment phenomena were temporally dissociated. While entrainment to the acoustic speech peaked around +0.3 s lag (i.e., when EEG followed speech by 0.3 s), entrainment to the lips was significantly anticipated and peaked around 0-0.1 s lag (i.e., when EEG was virtually synchronous to the putative lips movement). Our results demonstrate that neural entrainment during speech listening involves the anticipatory reconstruction of missing information related to lips movement production, indicating its fundamentally predictive nature and thus supporting analysis by synthesis models.

Download Full-text

Catecholamines Alter the Intrinsic Variability of Cortical Population Activity and Perception

10.1101/170613 ◽

2017 ◽

Cited By ~ 1

Author(s):

Thomas Pfeffer ◽

Arthur-Ervin Avramiea ◽

Guido Nolte ◽

Andreas K. Engel ◽

Klaus Linkenkaer-Hansen ◽

...

Keyword(s):

Temporal Structure ◽

Neuropsychiatric Disorders ◽

Pharmacological Intervention ◽

Brain State ◽

Intrinsic Variability ◽

Perceptual Inference ◽

Population Activity ◽

Scale Free ◽

Neural Computations ◽

Cortical Computation

ABSTRACTThe ascending modulatory systems of the brainstem are powerful regulators of global brain state. Disturbances of these systems are implicated in several major neuropsychiatric disorders. Yet, how these systems interact with specific neural computations in the cerebral cortex to shape perception, cognition, and behavior remains poorly understood. Here, we probed into the effect of two such systems, the catecholaminergic (dopaminergic and noradrenergic) and cholinergic systems, on an important aspect of cortical computation: its intrinsic variability. To this end, we combined placebo-controlled pharmacological intervention in humans, magnetoencephalographic (MEG) recordings of cortical population activity, and psychophysical measurements of the perception of ambiguous visual input. A low-dose catecholaminergic, but not cholinergic, manipulation altered the rate of spontaneous perceptual fluctuations as well as the temporal structure of “scale-free” population activity of large swaths of visual and parietal cortex. Computational analyses indicate that both effects were consistent with an increase in excitatory relative to inhibitory activity in the cortical areas underlying visual perceptual inference. We propose that catecholamines regulate the variability of perception and cognition through dynamically changing the cortical excitation-inhibition ratio. The combined read-out of fluctuations in perception and cortical activity we established here may prove useful as an efficient, and easily accessible marker of altered cortical computation in neuropsychiatric disorders.

Download Full-text

Predictive Neural Computations Support Spoken Word Recognition: Evidence from MEG and Competitor Priming

10.1101/2020.07.01.182717 ◽

2020 ◽

Author(s):

Yingcan Carol Wang ◽

Ediz Sohoglu ◽

Rebecca A. Gilbert ◽

Richard N. Henson ◽

Matthew H. Davis

Keyword(s):

Word Recognition ◽

Prediction Error ◽

Spoken Word Recognition ◽

Predictive Coding ◽

Spoken Word ◽

Speech Comprehension ◽

Speech Sounds ◽

Neural Responses ◽

Perceptual Inference ◽

Neural Computations

AbstractHuman listeners achieve quick and effortless speech comprehension through computations of conditional probability using Bayes rule. However, the neural implementation of Bayesian perceptual inference remains unclear. Competitive-selection accounts (e.g. TRACE) propose that word recognition is achieved through direct inhibitory connections between units representing candidate words that share segments (e.g. hygiene and hijack share /haid3/). Manipulations that increase lexical uncertainty should increase neural responses associated with word recognition when words cannot be uniquely identified (during the first syllable). In contrast, predictive-selection accounts (e.g. Predictive-Coding) proposes that spoken word recognition involves comparing heard and predicted speech sounds and using prediction error to update lexical representations. Increased lexical uncertainty in words like hygiene and hijack will increase prediction error and hence neural activity only at later time points when different segments are predicted (during the second syllable). We collected MEG data to distinguish these two mechanisms and used a competitor priming manipulation to change the prior probability of specific words. Lexical decision responses showed delayed recognition of target words (hygiene) following presentation of a neighbouring prime word (hijack) several minutes earlier. However, this effect was not observed with pseudoword primes (higent) or targets (hijure). Crucially, MEG responses in the STG showed greater neural responses for word-primed words after the point at which they were uniquely identified (after /haid3/ in hygiene) but not before while similar changes were again absent for pseudowords. These findings are consistent with accounts of spoken word recognition in which neural computations of prediction error play a central role.Significance StatementEffective speech perception is critical to daily life and involves computations that combine speech signals with prior knowledge of spoken words; that is, Bayesian perceptual inference. This study specifies the neural mechanisms that support spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Most established theories propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words. Our results instead support predictive-selection theories (e.g. Predictive-Coding): by comparing heard and predicted speech sounds, neural computations of prediction error can help listeners continuously update lexical probabilities, allowing for more rapid word identification.

Download Full-text

Coding of Auditory-Stimulus Identity in the Auditory Non-Spatial Processing Stream

Journal of Neurophysiology ◽

10.1152/jn.01069.2007 ◽

2008 ◽

Vol 99 (1) ◽

pp. 87-95 ◽

Cited By ~ 77

Author(s):

Brian E. Russ ◽

Ashlee L. Ackelson ◽

Allison E. Baker ◽

Yale E. Cohen

Keyword(s):

Prefrontal Cortex ◽

Information Processing ◽

Auditory Stimulus ◽

Superior Temporal Gyrus ◽

Spatial Processing ◽

Ventrolateral Prefrontal Cortex ◽

Cortical Areas ◽

Processing Stream ◽

Neural Computations ◽

Stimulus Identity

The neural computations that underlie the processing of auditory-stimulus identity are not well understood, especially how information is transformed across different cortical areas. Here, we compared the capacity of neurons in the superior temporal gyrus (STG) and the ventrolateral prefrontal cortex (vPFC) to code the identity of an auditory stimulus; these two areas are part of a ventral processing stream for auditory-stimulus identity. Whereas the responses of neurons in both areas are reliably modulated by different vocalizations, STG responses code significantly more vocalizations than those in the vPFC. Together, these data indicate that the STG and vPFC differentially code auditory identity, which suggests that substantial information processing takes place between these two areas. These findings are consistent with the hypothesis that the STG and the vPFC are part of a functional circuit for auditory-identity analysis.

Download Full-text