scholarly journals Cortical Representations of Speech in a Multi-talker Auditory Scene

2017 ◽  
Author(s):  
Krishna C. Puvvada ◽  
Jonathan Z. Simon

AbstractThe ability to parse a complex auditory scene into perceptual objects is facilitated by a hierarchical auditory system. Successive stages in the hierarchy transform an auditory scene of multiple overlapping sources, from peripheral tonotopically-based representations in the auditory nerve, into perceptually distinct auditory-objects based representation in auditory cortex. Here, using magnetoencephalography (MEG) recordings from human subjects, both men and women, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in distinct hierarchical stages of auditory cortex. Using systems-theoretic methods of stimulus reconstruction, we show that the primary-like areas in auditory cortex contain dominantly spectro-temporal based representations of the entire auditory scene. Here, both attended and ignored speech streams are represented with almost equal fidelity, and a global representation of the full auditory scene with all its streams is a better candidate neural representation than that of individual streams being represented separately. In contrast, we also show that higher order auditory cortical areas represent the attended stream separately, and with significantly higher fidelity, than unattended streams. Furthermore, the unattended background streams are more faithfully represented as a single unsegregated background object rather than as separated objects. Taken together, these findings demonstrate the progression of the representations and processing of a complex acoustic scene up through the hierarchy of human auditory cortex.Significance StatementUsing magnetoencephalography (MEG) recordings from human listeners in a simulated cocktail party environment, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in separate hierarchical stages of auditory cortex. We show that the primary-like areas in auditory cortex use a dominantly spectro-temporal based representation of the entire auditory scene, with both attended and ignored speech streams represented with almost equal fidelity. In contrast, we show that higher order auditory cortical areas represent an attended speech stream separately from, and with significantly higher fidelity than, unattended speech streams. Furthermore, the unattended background streams are represented as a single undivided background object rather than as distinct background objects.

2021 ◽  
Vol 15 ◽  
Author(s):  
Emina Alickovic ◽  
Elaine Hoi Ning Ng ◽  
Lorenz Fiedler ◽  
Sébastien Santurette ◽  
Hamish Innes-Brown ◽  
...  

ObjectivesPrevious research using non-invasive (magnetoencephalography, MEG) and invasive (electrocorticography, ECoG) neural recordings has demonstrated the progressive and hierarchical representation and processing of complex multi-talker auditory scenes in the auditory cortex. Early responses (<85 ms) in primary-like areas appear to represent the individual talkers with almost equal fidelity and are independent of attention in normal-hearing (NH) listeners. However, late responses (>85 ms) in higher-order non-primary areas selectively represent the attended talker with significantly higher fidelity than unattended talkers in NH and hearing–impaired (HI) listeners. Motivated by these findings, the objective of this study was to investigate the effect of a noise reduction scheme (NR) in a commercial hearing aid (HA) on the representation of complex multi-talker auditory scenes in distinct hierarchical stages of the auditory cortex by using high-density electroencephalography (EEG).DesignWe addressed this issue by investigating early (<85 ms) and late (>85 ms) EEG responses recorded in 34 HI subjects fitted with HAs. The HA noise reduction (NR) was either on or off while the participants listened to a complex auditory scene. Participants were instructed to attend to one of two simultaneous talkers in the foreground while multi-talker babble noise played in the background (+3 dB SNR). After each trial, a two-choice question about the content of the attended speech was presented.ResultsUsing a stimulus reconstruction approach, our results suggest that the attention-related enhancement of neural representations of target and masker talkers located in the foreground, as well as suppression of the background noise in distinct hierarchical stages is significantly affected by the NR scheme. We found that the NR scheme contributed to the enhancement of the foreground and of the entire acoustic scene in the early responses, and that this enhancement was driven by better representation of the target speech. We found that the target talker in HI listeners was selectively represented in late responses. We found that use of the NR scheme resulted in enhanced representations of the target and masker speech in the foreground and a suppressed representation of the noise in the background in late responses. We found a significant effect of EEG time window on the strengths of the cortical representation of the target and masker.ConclusionTogether, our analyses of the early and late responses obtained from HI listeners support the existing view of hierarchical processing in the auditory cortex. Our findings demonstrate the benefits of a NR scheme on the representation of complex multi-talker auditory scenes in different areas of the auditory cortex in HI listeners.


2017 ◽  
Author(s):  
Krishna C Puvvada ◽  
Marisel Villafañe-Delgado ◽  
Christian Brodbeck ◽  
Jonathan Z Simon

AbstractSpeech communication in daily listening environments is complicated by the phenomenon of reverberation, wherein any sound reaching the ear is a mixture of the direct component from the source and multiple reflections off surrounding objects and the environment. The brain plays a central role in comprehending speech accompanied by such distortion, which, frequently, is further complicated by the presence of additional noise sources in the vicinity. Here, using magnetoencephalography (MEG) recordings from human subjects, we investigate the neural representation of speech in noisy, reverberant listening conditions as measured by phase-locked MEG responses to the slow temporal modulations of speech. Using systems-theoretic linear methods of stimulus encoding, we observe that the cortex maintains both distorted and distortion-free (cleaned) representations of speech. Also, we show that, while neural encoding of speech remains robust to additive noise in absence of reverberation, it is detrimentally affected by noise when present along with reverberation. Further, using linear methods of stimulus reconstruction, we show that theta-band neural responses are a likely candidate for the distortion free representation of speech, whereas delta band responses are more likely to carry non-speech specific information regarding the listening environment.


2011 ◽  
Vol 105 (4) ◽  
pp. 1558-1573 ◽  
Author(s):  
Yu-Ting Mao ◽  
Tian-Miao Hua ◽  
Sarah L. Pallas

Sensory neocortex is capable of considerable plasticity after sensory deprivation or damage to input pathways, especially early in development. Although plasticity can often be restorative, sometimes novel, ectopic inputs invade the affected cortical area. Invading inputs from other sensory modalities may compromise the original function or even take over, imposing a new function and preventing recovery. Using ferrets whose retinal axons were rerouted into auditory thalamus at birth, we were able to examine the effect of varying the degree of ectopic, cross-modal input on reorganization of developing auditory cortex. In particular, we assayed whether the invading visual inputs and the existing auditory inputs competed for or shared postsynaptic targets and whether the convergence of input modalities would induce multisensory processing. We demonstrate that although the cross-modal inputs create new visual neurons in auditory cortex, some auditory processing remains. The degree of damage to auditory input to the medial geniculate nucleus was directly related to the proportion of visual neurons in auditory cortex, suggesting that the visual and residual auditory inputs compete for cortical territory. Visual neurons were not segregated from auditory neurons but shared target space even on individual target cells, substantially increasing the proportion of multisensory neurons. Thus spatial convergence of visual and auditory input modalities may be sufficient to expand multisensory representations. Together these findings argue that early, patterned visual activity does not drive segregation of visual and auditory afferents and suggest that auditory function might be compromised by converging visual inputs. These results indicate possible ways in which multisensory cortical areas may form during development and evolution. They also suggest that rehabilitative strategies designed to promote recovery of function after sensory deprivation or damage need to take into account that sensory cortex may become substantially more multisensory after alteration of its input during development.


2020 ◽  
Vol 123 (2) ◽  
pp. 695-706
Author(s):  
Lu Luo ◽  
Na Xu ◽  
Qian Wang ◽  
Liang Li

The central mechanisms underlying binaural unmasking for spectrally overlapping concurrent sounds, which are unresolved in the peripheral auditory system, remain largely unknown. In this study, frequency-following responses (FFRs) to two binaurally presented independent narrowband noises (NBNs) with overlapping spectra were recorded simultaneously in the inferior colliculus (IC) and auditory cortex (AC) in anesthetized rats. The results showed that for both IC FFRs and AC FFRs, introducing an interaural time difference (ITD) disparity between the two concurrent NBNs enhanced the representation fidelity, reflected by the increased coherence between the responses evoked by double-NBN stimulation and the responses evoked by single NBNs. The ITD disparity effect varied across frequency bands, being more marked for higher frequency bands in the IC and lower frequency bands in the AC. Moreover, the coherence between IC responses and AC responses was also enhanced by the ITD disparity, and the enhancement was most prominent for low-frequency bands and the IC and the AC on the same side. These results suggest a critical role of the ITD cue in the neural segregation of spectrotemporally overlapping sounds. NEW & NOTEWORTHY When two spectrally overlapped narrowband noises are presented at the same time with the same sound-pressure level, they mask each other. Introducing a disparity in interaural time difference between these two narrowband noises improves the accuracy of the neural representation of individual sounds in both the inferior colliculus and the auditory cortex. The lower frequency signal transformation from the inferior colliculus to the auditory cortex on the same side is also enhanced, showing the effect of binaural unmasking.


2007 ◽  
Vol 24 (6) ◽  
pp. 857-874 ◽  
Author(s):  
THOMAS FITZGIBBON ◽  
BRETT A. SZMAJDA ◽  
PAUL R. MARTIN

The thalamic reticular nucleus (TRN) supplies an important inhibitory input to the dorsal thalamus. Previous studies in non-primate mammals have suggested that the visual sector of the TRN has a lateral division, which has connections with first-order (primary) sensory thalamic and cortical areas, and a medial division, which has connections with higher-order (association) thalamic and cortical areas. However, the question whether the primate TRN is segregated in the same manner is controversial. Here, we investigated the connections of the TRN in a New World primate, the marmoset (Callithrix jacchus). The topography of labeled cells and terminals was analyzed following iontophoretic injections of tracers into the primary visual cortex (V1) or the dorsal lateral geniculate nucleus (LGNd). The results show that rostroventral TRN, adjacent to the LGNd, is primarily connected with primary visual areas, while the most caudal parts of the TRN are associated with higher order visual thalamic areas. A small region of the TRN near the caudal pole of the LGNd (foveal representation) contains connections where first (lateral TRN) and higher order visual areas (medial TRN) overlap. Reciprocal connections between LGNd and TRN are topographically organized, so that a series of rostrocaudal injections within the LGNd labeled cells and terminals in the TRN in a pattern shaped like rostrocaudal overlapping “fish scales.” We propose that the dorsal areas of the TRN, adjacent to the top of the LGNd, represent the lower visual field (connected with medial LGNd), and the more ventral parts of the TRN contain a map representing the upper visual field (connected with lateral LGNd).


2018 ◽  
Vol 30 (12) ◽  
pp. 3151-3167 ◽  
Author(s):  
Dmitry Krotov ◽  
John Hopfield

Deep neural networks (DNNs) trained in a supervised way suffer from two known problems. First, the minima of the objective function used in learning correspond to data points (also known as rubbish examples or fooling images) that lack semantic similarity with the training data. Second, a clean input can be changed by a small, and often imperceptible for human vision, perturbation so that the resulting deformed input is misclassified by the network. These findings emphasize the differences between the ways DNNs and humans classify patterns and raise a question of designing learning algorithms that more accurately mimic human perception compared to the existing methods. Our article examines these questions within the framework of dense associative memory (DAM) models. These models are defined by the energy function, with higher-order (higher than quadratic) interactions between the neurons. We show that in the limit when the power of the interaction vertex in the energy function is sufficiently large, these models have the following three properties. First, the minima of the objective function are free from rubbish images, so that each minimum is a semantically meaningful pattern. Second, artificial patterns poised precisely at the decision boundary look ambiguous to human subjects and share aspects of both classes that are separated by that decision boundary. Third, adversarial images constructed by models with small power of the interaction vertex, which are equivalent to DNN with rectified linear units, fail to transfer to and fool the models with higher-order interactions. This opens up the possibility of using higher-order models for detecting and stopping malicious adversarial attacks. The results we present suggest that DAMs with higher-order energy functions are more robust to adversarial and rubbish inputs than DNNs with rectified linear units.


2018 ◽  
Vol 104 (5) ◽  
pp. 774-777 ◽  
Author(s):  
Christian Brodbeck ◽  
Alessandro Presacco ◽  
Samira Anderson ◽  
Jonathan Z. Simon

eNeuro ◽  
2016 ◽  
Vol 3 (3) ◽  
pp. ENEURO.0071-16.2016 ◽  
Author(s):  
Yonatan I. Fishman ◽  
Christophe Micheyl ◽  
Mitchell Steinschneider

Sign in / Sign up

Export Citation Format

Share Document