scholarly journals Comparison of Parametric representations of Birdcall in Gaussian Mixture model

Author(s):  
Ricky Mohanty ◽  
Sandeep Singh Solanki

This paper focuses on the methods of automatic classifications of birds into different species based on feature extraction methods & audio recordings of their sounds. The recognition system uses Gaussian mixture model (GMM) to model 14 poultry bird species calls. Mel frequency cepstral coefficients (MFCC) parameters & wavelet parameters are used for feature vector extraction. The paper briefly explains the methods &  also evaluates the performance of these methods in Gaussian Mixture Model classification .The results depicts the performance of  Gaussian Mixture Model classification using wavelet was more efficient in terms of percentage of accuracy  at around 80% and computation was also faster.

2017 ◽  
Vol 10 (13) ◽  
pp. 140
Author(s):  
Kumari Piu Gorai ◽  
Thomas Abraham

A human being has lot of unique features and one of them is voice. Speaker recognition is the use of a system to distinguish and identify a person from his/her vocal sound. A speaker recognition system (SRS) can be used as one of the authentication technique, in addition to the conventional authentication methods. This paper represents the overview of voice signal characteristics and speaker recognition techniques. It also discusses the advantages and problem of current SRS. The only biometric system that allows users to authenticate remotely is voice-based SRS, we are in the need of a robust SRS.


2021 ◽  
Vol 11 (2) ◽  
pp. 35-41
Author(s):  
Thurgeaswary Rokanatnam ◽  
Hazinah Kutty Mammi

Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.


2013 ◽  
Vol 38 (4) ◽  
pp. 457-463 ◽  
Author(s):  
Chengwei Huang ◽  
Guoming Chen ◽  
Hua Yu ◽  
Yongqiang Bao ◽  
Li Zhao

Abstract Speaker‘s emotional states are recognized from speech signal with Additive white Gaussian noise (AWGN). The influence of white noise on a typical emotion recogniztion system is studied. The emotion classifier is implemented with Gaussian mixture model (GMM). A Chinese speech emotion database is used for training and testing, which includes nine emotion classes (e.g. happiness, sadness, anger, surprise, fear, anxiety, hesitation, confidence and neutral state). Two speech enhancement algorithms are introduced for improved emotion classification. In the experiments, the Gaussian mixture model is trained on the clean speech data, while tested under AWGN with various signal to noise ratios (SNRs). The emotion class model and the dimension space model are both adopted for the evaluation of the emotion recognition system. Regarding the emotion class model, the nine emotion classes are classified. Considering the dimension space model, the arousal dimension and the valence dimension are classified into positive regions or negative regions. The experimental results show that the speech enhancement algorithms constantly improve the performance of our emotion recognition system under various SNRs, and the positive emotions are more likely to be miss-classified as negative emotions under white noise environment.


2016 ◽  
Vol 25 (3) ◽  
pp. 387-399
Author(s):  
P. Mahesha ◽  
D.S. Vinod

AbstractThe classification of dysfluencies is one of the important steps in objective measurement of stuttering disorder. In this work, the focus is on investigating the applicability of automatic speaker recognition (ASR) method for stuttering dysfluency recognition. The system designed for this particular task relies on the Gaussian mixture model (GMM), which is the most widely used probabilistic modeling technique in ASR. The GMM parameters are estimated from Mel frequency cepstral coefficients (MFCCs). This statistical speaker-modeling technique represents the fundamental characteristic sounds of speech signal. Using this model, we build a dysfluency recognizer that is capable of recognizing dysfluencies irrespective of a person as well as what is being said. The performance of the system is evaluated for different types of dysfluencies such as syllable repetition, word repetition, prolongation, and interjection using speech samples from the University College London Archive of Stuttered Speech (UCLASS).


2007 ◽  
Vol 121 (3) ◽  
pp. 1737-1748 ◽  
Author(s):  
Marie A. Roch ◽  
Melissa S. Soldevilla ◽  
Jessica C. Burtenshaw ◽  
E. Elizabeth Henderson ◽  
John A. Hildebrand

Sign in / Sign up

Export Citation Format

Share Document