A new front-end for classification of non-speech sounds: a study on human whistle

Author(s):  
Mahesh Kumar Nandwana ◽  
Hynek Bořil ◽  
John H. L. Hansen
Keyword(s):  
Author(s):  
Gustavo Assunção ◽  
Paulo Menezes ◽  
Fernando Perdigão

<div class="page" title="Page 1"><div class="layoutArea"><div class="column"><p><span>The idea of recognizing human emotion through speech (SER) has recently received considerable attention from the research community, mostly due to the current machine learning trend. Nevertheless, even the most successful methods are still rather lacking in terms of adaptation to specific speakers and scenarios, evidently reducing their performance when compared to humans. In this paper, we evaluate a largescale machine learning model for classification of emotional states. This model has been trained for speaker iden- tification but is instead used here as a front-end for extracting robust features from emotional speech. We aim to verify that SER improves when some speak- er</span><span>’</span><span>s emotional prosody cues are considered. Experiments using various state-of- the-art classifiers are carried out, using the Weka software, so as to evaluate the robustness of the extracted features. Considerable improvement is observed when comparing our results with other SER state-of-the-art techniques.</span></p></div></div></div>


1996 ◽  
Vol 49 (2) ◽  
pp. 295-314 ◽  
Author(s):  
Ruth Campbell ◽  
Barbara Brooks ◽  
Edward de Haan ◽  
Tony Roberts

The separability of different subcomponents of face processing has been regularly affirmed, but not always so clearly demonstrated. In particular, the ability to extract speech from faces (lip-reading) has been shown to dissociate doubly from face identification in neurological but not in other populations. In this series of experiments with undergraduates, the classification of speech sounds (lip-reading) from personally familiar and unfamiliar face photographs was explored using speeded manual responses. The independence of lip-reading from identity-based processing was confirmed. Furthermore, the established pattern of independence of expression-matching from, and dependence of identity-matching on, face familiarity was extended to personally familiar faces and “difficult”-emotion decisions. The implications of these findings are discussed.


1990 ◽  
Vol 4 (3) ◽  
pp. 247-252 ◽  
Author(s):  
Bozydar L. J. Kaczmarek

1968 ◽  
Vol 44 (1) ◽  
pp. 366-366
Author(s):  
William J. Beninghof ◽  
Myron Jay Ross
Keyword(s):  

2021 ◽  
Author(s):  
Lam Pham ◽  
Hieu Tang ◽  
Anahid Jalal ◽  
Alexander Schindler ◽  
Ross King

In this paper, we presents a low-complexitydeep learning frameworks for acoustic scene classification(ASC). The proposed framework can be separated into threemain steps: Front-end spectrogram extraction, back-endclassification, and late fusion of predicted probabilities.First, we use Mel filter, Gammatone filter and ConstantQ Transfrom (CQT) to transform raw audio signal intospectrograms, where both frequency and temporal featuresare presented. Three spectrograms are then fed into threeindividual back-end convolutional neural networks (CNNs),classifying into ten urban scenes. Finally, a late fusion ofthree predicted probabilities obtained from three CNNs isconducted to achieve the final classification result. To reducethe complexity of our proposed CNN network, we applytwo model compression techniques: model restriction anddecomposed convolution. Our extensive experiments, whichare conducted on DCASE 2021 (IEEE AASP Challenge onDetection and Classification of Acoustic Scenes and Events)Task 1A development dataset, achieve a low-complexity CNNbased framework with 128 KB trainable parameters andthe best classification accuracy of 66.7%, improving DCASEbaseline by 19.0%.


1981 ◽  
Vol 52 (3) ◽  
pp. 1003-1006 ◽  
Author(s):  
W. G. Snow ◽  
S. Sheese
Keyword(s):  

This study attempted to cross-validate the accuracy in classification of Golden and Anderson's (1977) abbreviated version of the Halstead Speech Sounds Perception Test. A relatively high correlation was obtained between their short form and the standard long form for a sample of 150 patients, aged 15 to 70 yr. However, the long form was more accurate in discriminating between 31 brain-damaged and 31 nonbrain-damaged patients. These results suggest that the use of the short form of this test may reduce accuracy of classification.


Sign in / Sign up

Export Citation Format

Share Document