acoustic representation Latest Research Papers

Transformer Based Unsupervised Pre-Training for Acoustic Representation Learning

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414996 ◽

2021 ◽

Author(s):

Ruixiong Zhang ◽

Haiwei Wu ◽

Wubo Li ◽

Dongwei Jiang ◽

Wei Zou ◽

...

Keyword(s):

Representation Learning ◽

Acoustic Representation

Download Full-text

Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection

10.21437/interspeech.2020-2585 ◽

2020 ◽

Author(s):

Shreya G. Upadhyay ◽

Bo-Hao Su ◽

Chi-Chun Lee

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Event Detection ◽

Sound Event ◽

Sound Event Detection ◽

Acoustic Representation

Download Full-text

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Electronics ◽

10.3390/electronics9101725 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1725

Author(s):

Gintautas Tamulevičius ◽

Gražina Korvel ◽

Anil Bora Yayak ◽

Povilas Treigys ◽

Jolita Bernatavičienė ◽

...

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Audio Signal ◽

Speech Emotion Recognition ◽

Two Dimensional ◽

Acoustic Feature ◽

Emotional Information ◽

Feature Spaces ◽

Speaker Independent ◽

Acoustic Representation

In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation only. The assumption is that the speech audio signal carries sufficient emotional information to detect and retrieve it. Several two-dimensional acoustic feature spaces, such as cochleagrams, spectrograms, mel-cepstrograms, and fractal dimension-based space, are employed as the representations of speech emotional features. A convolutional neural network (CNN) is used as a classifier. The results show the superiority of cochleagrams over other feature spaces utilized. In the CNN-based speaker-independent cross-linguistic speech emotion recognition (SER) experiment, the accuracy of over 90% is achieved, which is close to the monolingual case of SER.

Download Full-text

Speech Enhancement Based on Fusion of Both Magnitude/Phase-Aware Features and Targets

Electronics ◽

10.3390/electronics9071125 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1125

Author(s):

Haitao Lang ◽

Jie Yang

Keyword(s):

Speech Enhancement ◽

Single Channel ◽

Performance Comparison ◽

Acoustic Features ◽

Subjective Test ◽

Noisy Speech ◽

Acoustic Representation ◽

Ablation Study ◽

Fusion Feature ◽

Human Listener

Recently, supervised learning methods have shown promising performance, especially deep neural network-based (DNN) methods, in the application of single-channel speech enhancement. Generally, those approaches extract the acoustic features directly from the noisy speech to train a magnitude-aware target. In this paper, we propose to extract the acoustic features not only from the noisy speech but also from the pre-estimated speech, noise and phase separately, then fuse them into a new complementary feature for the purpose of obtaining more discriminative acoustic representation. In addition, on the basis of learning a magnitude-aware target, we also utilize the fusion feature to learn a phase-aware target, thereby further improving the accuracy of the recovered speech. We conduct extensive experiments, including performance comparison with some typical existing methods, generalization ability evaluation on unseen noise, ablation study, and subjective test by human listener, to demonstrate the feasibility and effectiveness of the proposed method. Experimental results prove that the proposed method has the ability to improve the quality and intelligibility of the reconstructed speech.

Download Full-text

Changing perspectives on /s/ and gender over time in Glasgow

Linguistics Vanguard ◽

10.1515/lingvan-2018-0064 ◽

2020 ◽

Vol 6 (s1) ◽

Author(s):

Jane Stuart-Smith

Keyword(s):

Acoustic Analysis ◽

Single Point ◽

Spectral Slope ◽

Discrete Cosine Transformation ◽

Place Of Articulation ◽

Centre Of Gravity ◽

Acoustic Representation ◽

And Gender ◽

First Time ◽

Over Time

AbstractThis paper considers the relative influence on sociophonetic interpretation of /s/ using “static” and “dynamic” acoustic analysis, where dynamic refers to the use of measures which capture the time-varying nature of segmental acoustics, and static to measures which are taken at a single point, or from an average across the sound (Watson and Harrington 1999, Docherty et al. 2015). Static and dynamic Discrete Cosine Transformation (DCT) analyses of spectral Centre of Gravity (CoG) and spectral Slope measures/trajectories were carried out on gendered productions of /s/ and /ʃ/ for a real- and apparent-time 32 speaker sample of spontaneous Glasgow dialect. Results of static CoG measures, reflecting place of articulation, indicate a reduction of gender differentation over time, such that girls born most recently revert to older vernacular (lower frequency) norms. Adding static spectral Slope, reflecting articulatory constriction, shows a change in gendered differentiation, whereby boys born most recently show a gestural shift. The DCT analysis both confirms the static results and also reveals that dynamic characteristics of both sibilants carry key additional prosodic, linguistic and social information for this community. Our results reflect for the first time the usefulness of changing analytical perspectives on /s/, both in terms of acoustic representation (static and dynamic), but also by considering /s/ alongside its partner sibilant /ʃ/ within the context of shifts in the construction of social gender over time.

Download Full-text

Marine Mammal Species Classification Using Convolutional Neural Networks and a Novel Acoustic Representation

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-46133-1_18 ◽

2020 ◽

pp. 290-305 ◽

Cited By ~ 1

Author(s):

Mark Thomas ◽

Bruce Martin ◽

Katie Kowarski ◽

Briand Gaudet ◽

Stan Matwin

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Marine Mammal ◽

Species Classification ◽

Mammal Species ◽

Acoustic Representation

Download Full-text

Slagscener i det elizabethanske teater

Passage - Tidsskrift for litteratur og kritik ◽

10.7146/pas.v33i80.111728 ◽

2018 ◽

Vol 33 (80) ◽

pp. 125-144

Author(s):

Christian Dahl

Keyword(s):

17Th Century ◽

Narrative Development ◽

Cultural Impact ◽

English Drama ◽

Use Of Data ◽

Acoustic Representation ◽

Battle Scenes ◽

Military Engagement ◽

The Many ◽

Early English

Christian Dahl: “Battle scenes in the Elizabethan theater”This article analyses the widespread use of staged battle in Elizabethan theater by use of data extracted from Folger Library’s Digital Anthology of Early English Drama. Between 1576 and 1616, hundreds of battle scenes were produced on English stages but although a substantial number is still available for study, only few scholars have recognized their significance. The many battle scenes both attest to the Elizabethans’ vivid interest in history and to the cultural impact of England’s increasing military engagement on the Continent and in Ireland at the end of Elizabeth’s reign. It is often assumed that histories and battle scenes were particularly popular in the 1590’ies and then fell out of fashion early in the 17th century, but the article demonstrates that staged war remained a frequent occurrence in the first two decades of the century and never disappeared entirely. The article discusses visual and, in particular, acoustic representation of warfare based on the evidence of surviving plays and other documents. The article will also (very) briefly sketch the narrative development of battle scenes that took place in the 1590ies.

Download Full-text

Smart bats click twice

eLife ◽

10.7554/elife.36561 ◽

2018 ◽

Vol 7 ◽

Author(s):

Manfred Kössl ◽

Julio Hechavarría

Keyword(s):

Acoustic Representation

The acoustic representation of the outside world in the midbrain of a bat becomes more precise as it uses double clicks to locate closer objects.

Download Full-text

Acoustic Road Monitoring

MATEC Web of Conferences ◽

10.1051/matecconf/201823105002 ◽

2018 ◽

Vol 231 ◽

pp. 05002 ◽

Cited By ~ 1

Author(s):

Karolina Marciniuk ◽

Maciej Blaszke ◽

Bożena Kostek

Keyword(s):

Traffic Flow ◽

Monitoring System ◽

Signal Analysis ◽

Road Traffic ◽

Weather Condition ◽

Audio Signal ◽

Flow Sensors ◽

The Road ◽

The Subject ◽

Acoustic Representation

The subject of this research is showing the performance of an automatic acoustic road monitoring system proposed by the authors. The main goal of the study is describing road traffic by means of an acoustic representation and testing effectiveness of traffic flow sensors. Evaluation metrics of the road conditions such as velocity of the traffic flow, its structure and weather condition are presented along with acoustic descriptors derived from the audio signal analysis. Accuracy of emergency vehicles pass by detection based on acoustic monitoring is also briefly described.

Download Full-text

Acoustic Representation of Monophthongs with Special Reference to Bodo Language

Advances in Intelligent Systems and Computing - Speech and Language Processing for Human-Machine Communications ◽

10.1007/978-981-10-6626-9_19 ◽

2017 ◽

pp. 173-178

Author(s):

Uzzal Sharma

Keyword(s):

Special Reference ◽

Acoustic Representation

Download Full-text

acoustic representation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Transformer Based Unsupervised Pre-Training for Acoustic Representation Learning

Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Speech Enhancement Based on Fusion of Both Magnitude/Phase-Aware Features and Targets

Changing perspectives on /s/ and gender over time in Glasgow

Marine Mammal Species Classification Using Convolutional Neural Networks and a Novel Acoustic Representation

Slagscener i det elizabethanske teater

Smart bats click twice

Acoustic Road Monitoring

Acoustic Representation of Monophthongs with Special Reference to Bodo Language

Export Citation Format

acoustic representationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Transformer Based Unsupervised Pre-Training for Acoustic Representation Learning

Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Speech Enhancement Based on Fusion of Both Magnitude/Phase-Aware Features and Targets

Changing perspectives on /s/ and gender over time in Glasgow

Marine Mammal Species Classification Using Convolutional Neural Networks and a Novel Acoustic Representation

Slagscener i det elizabethanske teater

Smart bats click twice

Acoustic Road Monitoring

Acoustic Representation of Monophthongs with Special Reference to Bodo Language

acoustic representation
Recently Published Documents