Perceptual grouping in the cocktail party: contributions of voice-feature continuity

Mapping Intimacies ◽

10.1101/379545 ◽

2018 ◽

Author(s):

Jens Kreitewolf ◽

Samuel R. Mathias ◽

Régis Trapeau ◽

Jonas Obleser ◽

Marc Schönwiesner

Keyword(s):

Auditory System ◽

Perceptual Grouping ◽

Target Location ◽

Vocal Tract ◽

Stream Segregation ◽

Cocktail Party ◽

Cocktail Party Problem ◽

Spatial Discontinuity ◽

Spatial Locations ◽

The Voice

AbstractCocktail parties pose a difficult yet solvable problem for the auditory system. Previous work has shown that the cocktail-party problem is considerably easier when all sounds in the target stream are spoken by the same talker (the voice-continuity benefit). The present study investigated the contributions of two of the most salient voice features — glottal-pulse rate (GPR) and vocal-tract length (VTL) — to the voice-continuity benefit. Twenty young, normal-hearing listeners participated in two experiments. On each trial, listeners heard concurrent sequences of spoken digits from three different spatial locations and reported the digits coming from a target location. Critically, across conditions, GPR and VTL either remained constant or varied across target digits. Additionally, across experiments, the target location either remained constant (Experiment 1) or varied (Experiment 2) within a trial. In Experiment 1, listeners benefited from continuity in either voice feature, but VTL continuity was more helpful than GPR continuity. In Experiment 2, spatial discontinuity greatly hindered listeners’ abilities to exploit continuity in GPR and VTL. The present results suggest that selective attention benefits from continuity in target voice features, and that VTL and GPR play different roles for perceptual grouping and stream segregation in the cocktail party.

Download Full-text

Mechanisms for Communicating in a Marmoset ‘Cocktail Party’

10.1101/2020.12.08.416693 ◽

2020 ◽

Author(s):

Vladimir Jovanovic ◽

Cory T Miller

Keyword(s):

Real World ◽

Active Role ◽

Stream Segregation ◽

Vocal Behavior ◽

Cocktail Party ◽

Cocktail Party Problem ◽

Single Speaker ◽

The Impact ◽

The Voice ◽

Active Natural

AbstractA key challenge for audition is parsing the voice of a single speaker amid a cacophony of other voices known as the Cocktail Party Problem (CPP). Despite its prevalence, relatively little remains known about how our simian cousins resolve the CPP for active, natural communication. Here we employed an innovative, multi-speaker paradigm comprising five computer-generated Virtual Monkeys (VM) whose respective vocal behavior could be systematically varied to construct marmoset cocktail parties and tested the impact of specific acoustic scene manipulations on subjects’ natural conversations. Results indicate that marmosets not only employ auditory mechanisms – including attention – for speaker stream segregation, but also selectively change their own vocal behavior in response to the dynamics of the acoustic scene. These findings suggest notable parallels with human audition to solve the CPP and highlight the active role that individuals play to optimize communicative efficacy in complex real-world acoustic scenes.

Download Full-text

Musical training, individual differences and the cocktail party problem

Scientific Reports ◽

10.1038/srep11628 ◽

2015 ◽

Vol 5 (1) ◽

Cited By ~ 56

Author(s):

Jayaganesh Swaminathan ◽

Christine R. Mason ◽

Timothy M. Streeter ◽

Virginia Best ◽

Gerald Kidd, Jr ◽

...

Keyword(s):

Individual Differences ◽

Musical Training ◽

Spatial Location ◽

Large Individual ◽

Cocktail Party ◽

Cognitive Mechanisms ◽

Cocktail Party Problem ◽

Speech In Noise ◽

Speech Science ◽

Spatial Locations

Abstract Are musicians better able to understand speech in noise than non-musicians? Recent findings have produced contradictory results. Here we addressed this question by asking musicians and non-musicians to understand target sentences masked by other sentences presented from different spatial locations, the classical ‘cocktail party problem’ in speech science. We found that musicians obtained a substantial benefit in this situation, with thresholds ~6 dB better than non-musicians. Large individual differences in performance were noted particularly for the non-musically trained group. Furthermore, in different conditions we manipulated the spatial location and intelligibility of the masking sentences, thus changing the amount of ‘informational masking’ (IM) while keeping the amount of ‘energetic masking’ (EM) relatively constant. When the maskers were unintelligible and spatially separated from the target (low in IM), musicians and non-musicians performed comparably. These results suggest that the characteristics of speech maskers and the amount of IM can influence the magnitude of the differences found between musicians and non-musicians in multiple-talker “cocktail party” environments. Furthermore, considering the task in terms of the EM-IM distinction provides a conceptual framework for future behavioral and neuroscientific studies which explore the underlying sensory and cognitive mechanisms contributing to enhanced “speech-in-noise” perception by musicians.

Download Full-text

Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time

Frontiers in Neuroscience ◽

10.3389/fnins.2015.00309 ◽

2015 ◽

Vol 9 ◽

Cited By ~ 10

Author(s):

Chetan Singh Thakur ◽

Runchun M. Wang ◽

Saeed Afshar ◽

Tara J. Hamilton ◽

Jonathan C. Tapson ◽

...

Keyword(s):

Real Time ◽

Stream Segregation ◽

Cocktail Party ◽

Cocktail Party Problem

Download Full-text

G-Cocktail: An Algorithm to Address Cocktail Party Problem of Gujarati Language using CatBoost

10.21203/rs.3.rs-305722/v1 ◽

2021 ◽

Author(s):

Monika Gupta ◽

R K Singh ◽

Sachin Singh

Keyword(s):

Indian Languages ◽

Cocktail Party ◽

Mel Frequency Cepstral Coefficients ◽

Indian Language ◽

Native Languages ◽

Voice Command ◽

Cocktail Party Problem ◽

Voice Signal ◽

Gujarati Language ◽

The Voice

Abstract The pandemic caused due to COVID-19, has seen things going online. People tired of typing prefer to give voice commands. Most of the voice based applications and devices are not prepared to handle the native languages. Moreover, in a party environment it is difficult to identify a voice command as there are many speakers. The proposed work addresses the Cocktail party problem of Indian language, Gujarati. The voice response systems like, Siri, Alexa, Google Assistant as of now work on single voice command. The proposed algorithm G- Cocktail would help these applications to identify command given in Gujarati even from a mixed voice signal. Benchmark Dataset is taken from Microsoft and Linguistic Data Consortium for Indian Languages(LDC-IL) comprising single words and phrases. G-Cocktail utilizes the power of CatBoost algorithm to classify and identify the voice. Voice print of the entire sound files is created using Pitch, and Mel Frequency Cepstral Coefficients (MFCC). Seventy percent of the voice prints are used to train the network and thirty percent for testing. The proposed work is tested and compared with K-means, Naïve Bayes, and LightGBM.

Download Full-text

Acute alcohol intoxication and the cocktail party problem: do “mocktails” help or hinder?

Psychopharmacology ◽

10.1007/s00213-021-05924-6 ◽

2021 ◽

Author(s):

Alistair J. Harvey ◽

C. Philip Beaman

Keyword(s):

Working Memory ◽

Memory Span ◽

Alcohol Intoxication ◽

Alcoholic Beverage ◽

Cognitive Capacity ◽

Cocktail Party ◽

Primary Input ◽

Cocktail Party Problem ◽

Null Effect ◽

Cocktail Party Effect

Abstract Rationale To test the notion that alcohol impairs auditory attentional control by reducing the listener’s cognitive capacity. Objectives We examined the effect of alcohol consumption and working memory span on dichotic speech shadowing and the cocktail party effect—the ability to focus on one of many simultaneous speakers yet still detect mention of one’s name amidst the background speech. Alcohol was expected either to increase name detection, by weakening the inhibition of irrelevant speech, or reduce name detection, by restricting auditory attention on to the primary input channel. Low-span participants were expected to show larger drug impairments than high-span counterparts. Methods On completion of the working memory span task, participants (n = 81) were randomly assigned to an alcohol or placebo beverage treatment. After alcohol absorption, they shadowed speech presented to one ear while ignoring the synchronised speech of a different speaker presented to the other. Each participant’s first name was covertly embedded in to-be-ignored speech. Results The “cocktail party effect” was not affected by alcohol or working memory span, though low-span participants made more shadowing errors and recalled fewer words from the primary channel than high-span counterparts. Bayes factors support a null effect of alcohol on the cocktail party phenomenon, on shadowing errors and on memory for either shadowed or ignored speech. Conclusion Findings suggest that an alcoholic beverage producing a moderate level of intoxication (M BAC ≈ 0.08%) neither enhances nor impairs the cocktail party effect.

Download Full-text

Modeling the Cocktail Party Problem

Springer Handbook of Auditory Research - The Auditory System at the Cocktail Party ◽

10.1007/978-3-319-51662-2_5 ◽

2017 ◽

pp. 111-135 ◽

Cited By ~ 1

Author(s):

Mounya Elhilali

Keyword(s):

Cocktail Party ◽

Cocktail Party Problem

Download Full-text

Cocktail Party Problem

Machine Audition ◽

10.4018/978-1-61520-919-4.ch003 ◽

2010 ◽

pp. 61-79 ◽

Cited By ~ 1

Author(s):

Tariqullah Jan ◽

Wenwu Wang

Keyword(s):

Sparse Coding ◽

Source Separation ◽

Auditory Scene Analysis ◽

Future Research ◽

Cocktail Party ◽

Analysis Model ◽

Computational Auditory Scene Analysis ◽

Cocktail Party Problem ◽

Auditory Scene ◽

Future Research Directions

Cocktail party problem is a classical scientific problem that has been studied for decades. Humans have remarkable skills in segregating target speech from a complex auditory mixture obtained in a cocktail party environment. Computational modeling for such a mechanism is however extremely challenging. This chapter presents an overview of several recent techniques for the source separation issues associated with this problem, including independent component analysis/blind source separation, computational auditory scene analysis, model-based approaches, non-negative matrix factorization and sparse coding. As an example, a multistage approach for source separation is included. The application areas of cocktail party processing are explored. Potential future research directions are also discussed.

Download Full-text

Structure and Function of the Singing Voice

The Oxford Handbook of Singing ◽

10.1093/oxfordhb/9780199660773.013.019 ◽

2015 ◽

pp. 2-30

Author(s):

Gillyanne Kayes

Keyword(s):

Developmental Stages ◽

Vocal Tract ◽

Singing Voice ◽

Hormonal Change ◽

Research Knowledge ◽

Vocal Function ◽

And Function ◽

The Impact ◽

The Voice ◽

Structural Aspects

Key structural aspects of the vocal mechanism and the physiology of vocal function are presented and discussed in relation to the singing voice. Details of anatomical structure and physiological function are given for the regions of the vocal tract and respiratory system under the broad headings of respiration, phonation (the larynx), and resonation. Use of voice in singing is examined in terms of breath use, control of pitch, and loudness, and shaping of resonance for change of timbre. Key developmental stages during the lifecycle are given, including infancy, childhood, voice mutation in adolescence, and the impact of hormonal change on the voice. Differences between the genders in adulthood are discussed in the light of current research knowledge of voice.

Download Full-text