scholarly journals Perceptual grouping in the cocktail party: contributions of voice-feature continuity

2018 ◽  
Author(s):  
Jens Kreitewolf ◽  
Samuel R. Mathias ◽  
Régis Trapeau ◽  
Jonas Obleser ◽  
Marc Schönwiesner

AbstractCocktail parties pose a difficult yet solvable problem for the auditory system. Previous work has shown that the cocktail-party problem is considerably easier when all sounds in the target stream are spoken by the same talker (the voice-continuity benefit). The present study investigated the contributions of two of the most salient voice features — glottal-pulse rate (GPR) and vocal-tract length (VTL) — to the voice-continuity benefit. Twenty young, normal-hearing listeners participated in two experiments. On each trial, listeners heard concurrent sequences of spoken digits from three different spatial locations and reported the digits coming from a target location. Critically, across conditions, GPR and VTL either remained constant or varied across target digits. Additionally, across experiments, the target location either remained constant (Experiment 1) or varied (Experiment 2) within a trial. In Experiment 1, listeners benefited from continuity in either voice feature, but VTL continuity was more helpful than GPR continuity. In Experiment 2, spatial discontinuity greatly hindered listeners’ abilities to exploit continuity in GPR and VTL. The present results suggest that selective attention benefits from continuity in target voice features, and that VTL and GPR play different roles for perceptual grouping and stream segregation in the cocktail party.

2020 ◽  
Author(s):  
Vladimir Jovanovic ◽  
Cory T Miller

AbstractA key challenge for audition is parsing the voice of a single speaker amid a cacophony of other voices known as the Cocktail Party Problem (CPP). Despite its prevalence, relatively little remains known about how our simian cousins resolve the CPP for active, natural communication. Here we employed an innovative, multi-speaker paradigm comprising five computer-generated Virtual Monkeys (VM) whose respective vocal behavior could be systematically varied to construct marmoset cocktail parties and tested the impact of specific acoustic scene manipulations on subjects’ natural conversations. Results indicate that marmosets not only employ auditory mechanisms – including attention – for speaker stream segregation, but also selectively change their own vocal behavior in response to the dynamics of the acoustic scene. These findings suggest notable parallels with human audition to solve the CPP and highlight the active role that individuals play to optimize communicative efficacy in complex real-world acoustic scenes.


2015 ◽  
Vol 5 (1) ◽  
Author(s):  
Jayaganesh Swaminathan ◽  
Christine R. Mason ◽  
Timothy M. Streeter ◽  
Virginia Best ◽  
Gerald Kidd, Jr ◽  
...  

Abstract Are musicians better able to understand speech in noise than non-musicians? Recent findings have produced contradictory results. Here we addressed this question by asking musicians and non-musicians to understand target sentences masked by other sentences presented from different spatial locations, the classical ‘cocktail party problem’ in speech science. We found that musicians obtained a substantial benefit in this situation, with thresholds ~6 dB better than non-musicians. Large individual differences in performance were noted particularly for the non-musically trained group. Furthermore, in different conditions we manipulated the spatial location and intelligibility of the masking sentences, thus changing the amount of ‘informational masking’ (IM) while keeping the amount of ‘energetic masking’ (EM) relatively constant. When the maskers were unintelligible and spatially separated from the target (low in IM), musicians and non-musicians performed comparably. These results suggest that the characteristics of speech maskers and the amount of IM can influence the magnitude of the differences found between musicians and non-musicians in multiple-talker “cocktail party” environments. Furthermore, considering the task in terms of the EM-IM distinction provides a conceptual framework for future behavioral and neuroscientific studies which explore the underlying sensory and cognitive mechanisms contributing to enhanced “speech-in-noise” perception by musicians.


2015 ◽  
Vol 9 ◽  
Author(s):  
Chetan Singh Thakur ◽  
Runchun M. Wang ◽  
Saeed Afshar ◽  
Tara J. Hamilton ◽  
Jonathan C. Tapson ◽  
...  

2021 ◽  
Author(s):  
Monika Gupta ◽  
R K Singh ◽  
Sachin Singh

Abstract The pandemic caused due to COVID-19, has seen things going online. People tired of typing prefer to give voice commands. Most of the voice based applications and devices are not prepared to handle the native languages. Moreover, in a party environment it is difficult to identify a voice command as there are many speakers. The proposed work addresses the Cocktail party problem of Indian language, Gujarati. The voice response systems like, Siri, Alexa, Google Assistant as of now work on single voice command. The proposed algorithm G- Cocktail would help these applications to identify command given in Gujarati even from a mixed voice signal. Benchmark Dataset is taken from Microsoft and Linguistic Data Consortium for Indian Languages(LDC-IL) comprising single words and phrases. G-Cocktail utilizes the power of CatBoost algorithm to classify and identify the voice. Voice print of the entire sound files is created using Pitch, and Mel Frequency Cepstral Coefficients (MFCC). Seventy percent of the voice prints are used to train the network and thirty percent for testing. The proposed work is tested and compared with K-means, Naïve Bayes, and LightGBM.


Author(s):  
Alistair J. Harvey ◽  
C. Philip Beaman

Abstract Rationale To test the notion that alcohol impairs auditory attentional control by reducing the listener’s cognitive capacity. Objectives We examined the effect of alcohol consumption and working memory span on dichotic speech shadowing and the cocktail party effect—the ability to focus on one of many simultaneous speakers yet still detect mention of one’s name amidst the background speech. Alcohol was expected either to increase name detection, by weakening the inhibition of irrelevant speech, or reduce name detection, by restricting auditory attention on to the primary input channel. Low-span participants were expected to show larger drug impairments than high-span counterparts. Methods On completion of the working memory span task, participants (n = 81) were randomly assigned to an alcohol or placebo beverage treatment. After alcohol absorption, they shadowed speech presented to one ear while ignoring the synchronised speech of a different speaker presented to the other. Each participant’s first name was covertly embedded in to-be-ignored speech. Results The “cocktail party effect” was not affected by alcohol or working memory span, though low-span participants made more shadowing errors and recalled fewer words from the primary channel than high-span counterparts. Bayes factors support a null effect of alcohol on the cocktail party phenomenon, on shadowing errors and on memory for either shadowed or ignored speech. Conclusion Findings suggest that an alcoholic beverage producing a moderate level of intoxication (M BAC ≈ 0.08%) neither enhances nor impairs the cocktail party effect.


2010 ◽  
pp. 61-79 ◽  
Author(s):  
Tariqullah Jan ◽  
Wenwu Wang

Cocktail party problem is a classical scientific problem that has been studied for decades. Humans have remarkable skills in segregating target speech from a complex auditory mixture obtained in a cocktail party environment. Computational modeling for such a mechanism is however extremely challenging. This chapter presents an overview of several recent techniques for the source separation issues associated with this problem, including independent component analysis/blind source separation, computational auditory scene analysis, model-based approaches, non-negative matrix factorization and sparse coding. As an example, a multistage approach for source separation is included. The application areas of cocktail party processing are explored. Potential future research directions are also discussed.


Author(s):  
Gillyanne Kayes

Key structural aspects of the vocal mechanism and the physiology of vocal function are presented and discussed in relation to the singing voice. Details of anatomical structure and physiological function are given for the regions of the vocal tract and respiratory system under the broad headings of respiration, phonation (the larynx), and resonation. Use of voice in singing is examined in terms of breath use, control of pitch, and loudness, and shaping of resonance for change of timbre. Key developmental stages during the lifecycle are given, including infancy, childhood, voice mutation in adolescence, and the impact of hormonal change on the voice. Differences between the genders in adulthood are discussed in the light of current research knowledge of voice.


2009 ◽  
Vol 125 (4) ◽  
pp. 2489-2489
Author(s):  
Micheal L. Dent ◽  
Barbara G. Shinn‐Cunningham ◽  
Kamal Sen

Sign in / Sign up

Export Citation Format

Share Document