scholarly journals Automatic Speaker Recognition for Mobile Forensic Applications

2017 ◽  
Vol 2017 ◽  
pp. 1-6 ◽  
Author(s):  
Mohammed Algabri ◽  
Hassan Mathkour ◽  
Mohamed A. Bencherif ◽  
Mansour Alsulaiman ◽  
Mohamed A. Mekhtiche

Presently, lawyers, law enforcement agencies, and judges in courts use speech and other biometric features to recognize suspects. In general, speaker recognition is used for discriminating people based on their voices. The process of determining, if a suspected speaker is the source of trace, is called forensic speaker recognition. In such applications, the voice samples are most probably noisy, the recording sessions might mismatch each other, the sessions might not contain sufficient recording for recognition purposes, and the suspect voices are recorded through mobile channel. The identification of a person through his voice within a forensic quality context is challenging. In this paper, we propose a method for forensic speaker recognition for the Arabic language; the King Saud University Arabic Speech Database is used for obtaining experimental results. The advantage of this database is that each speaker’s voice is recorded in both clean and noisy environments, through a microphone and a mobile channel. This diversity facilitates its usage in forensic experimentations. Mel-Frequency Cepstral Coefficients are used for feature extraction and the Gaussian mixture model-universal background model is used for speaker modeling. Our approach has shown low equal error rates (EER), within noisy environments and with very short test samples.

Author(s):  
DEBASHISH DEV MISHRA ◽  
UTPAL BHATTACHARJEE ◽  
SHIKHAR KUMAR SARMA

The performance of automatic speaker recognition (ASR) system degrades drastically in the presence of noise and other distortions, especially when there is a noise level mismatch between the training and testing environments. This paper explores the problem of speaker recognition in noisy conditions, assuming that speech signals are corrupted by noise. A major problem of most speaker recognition systems is their unsatisfactory performance in noisy environments. In this experimental research, we have studied a combination of Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and Cepstral Mean Normalization (CMN) techniques for speech enhancement. Our system uses a Gaussian Mixture Models (GMM) classifier and is implemented under MATLAB®7 programming environment. The process involves the use of speaker data for both training and testing. The data used for testing is matched up against a speaker model, which is trained with the training data using GMM modeling. Finally, experiments are carried out to test the new model for ASR given limited training data and with differing levels and types of realistic background noise. The results have demonstrated the robustness of the new system.


Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Haithem Abd Al-Raheem Taha ◽  
Mohanad Abd Shehab ◽  
Mohamed A.M. Abdullah

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>


2019 ◽  
Vol 33 (35) ◽  
pp. 1950438 ◽  
Author(s):  
Manish Gupta ◽  
Shambhu Shankar Bharti ◽  
Suneeta Agarwal

Speech is a convenient medium for communication among human beings. Speaker recognition is a process of automatically recognizing the speaker by processing the information included in the speech signal. In this paper, a new approach is proposed for speaker recognition through speech signal. Here, a two-level approach is proposed. In the first-level, the gender of the speaker is recognized, and in the second-level speaker is recognized based on recognized gender at first-level. After recognizing the gender of the speaker, search space is reduced to half for the second-level as speaker recognition system searches only in a set of speech signals belonging to identified gender. To identify gender, gender-specific features: Mel Frequency Cepstral Coefficients (MFCC) and pitch are used. Speaker is recognized by using speaker specific features: MFCC, Pitch and RASTA-PLP. Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers are used for identifying the gender and recognizing the speaker, respectively. Experiments are performed on speech signals of two databases: “IIT-Madras speech synthesis and recognition” (containing speech samples spoken by eight male and eight female speakers of eight different regions in English language) and “ELSDSR” (containing speech samples spoken by five male and five female in English language). Experimentally, it is observed that by using two-level approach, time taken for speaker recognition is reduced by 30–32% as compared to the approach when speaker is recognized without identifying the gender (single-level approach). The accuracy of speaker recognition in this proposed approach is also improved from 99.7% to 99.9% as compared to single-level approach. It is concluded through the experiments that speech signal of a minimum 1.12 duration (after neglecting silence parts) is sufficient for recognizing the speaker.


2016 ◽  
Vol 25 (3) ◽  
pp. 387-399
Author(s):  
P. Mahesha ◽  
D.S. Vinod

AbstractThe classification of dysfluencies is one of the important steps in objective measurement of stuttering disorder. In this work, the focus is on investigating the applicability of automatic speaker recognition (ASR) method for stuttering dysfluency recognition. The system designed for this particular task relies on the Gaussian mixture model (GMM), which is the most widely used probabilistic modeling technique in ASR. The GMM parameters are estimated from Mel frequency cepstral coefficients (MFCCs). This statistical speaker-modeling technique represents the fundamental characteristic sounds of speech signal. Using this model, we build a dysfluency recognizer that is capable of recognizing dysfluencies irrespective of a person as well as what is being said. The performance of the system is evaluated for different types of dysfluencies such as syllable repetition, word repetition, prolongation, and interjection using speech samples from the University College London Archive of Stuttered Speech (UCLASS).


2021 ◽  
Vol 39 (1B) ◽  
pp. 30-40
Author(s):  
Ahmed M. Ahmed ◽  
Aliaa K. Hassan

Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.


Author(s):  
AMITA PAL ◽  
SMARAJIT BOSE ◽  
GOPAL K. BASAK ◽  
AMITAVA MUKHOPADHYAY

For solving speaker identification problems, the approach proposed by Reynolds [IEEE Signal Process. Lett.2 (1995) 46–48], using Gaussian Mixture Models (GMMs) based on Mel Frequency Cepstral Coefficients (MFCCs) as features, is one of the most effective available in the literature. The use of GMMs for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker-dependent spectral shapes, and also by the capability of Gaussian mixtures to model arbitrary densities. In this work, we have initially illustrated, with the help of a new bilingual speech corpus, how the well-known principal component transformation, in conjunction with the principle of classifier combination can be used to enhance the performance of the MFCC-GMM speaker recognition systems significantly. Subsequently, we have emphatically and rigorously established the same using the benchmark speech corpus NTIMIT. A significant outcome of this work is that the proposed approach has the potential to enhance the performance of any speaker recognition system based on correlated features.


Author(s):  
Gizachew Belayneh Gebre Et. al.

In this artificial intelligence time, speaker recognition is the most useful biometric recognition technique. Security is a big issue that needs careful attention because of every activities have been becoming automated and internet based. For security purpose, unique features of authorized user are highly needed. Voice is one of the wonderful unique biometric features. So, developing speaker recognition based on scientific research is the most concerned issue. Nowadays, criminal activities are increasing day to day in different clever way. So, every country should have strengthen forensic investigation using such technologies. The study was done by inspiration of contextualizing this concept for our country. In this study, text-independent Amharic language speaker recognition model was developed using Mel-Frequency Cepstral Coefficients to extract features from preprocessed speech signals and Artificial Neural Network to model the feature vector obtained from the Mel-Frequency Cepstral Coefficients and to classify objects while testing. The researcher used 20 sampled speeches of 10 each speaker (total of 200 speech samples) for training and testing separately. By setting the number of hidden neurons to 15, 20, and 25, three different models have been developed and evaluated for accuracy. The fourth-generation high-level programming language and interactive environment MATLAB is used to conduct the overall study implementations. At the end, very promising findings have been obtained. The study achieved better performance than other related researches which used Vector Quantization and Gaussian Mixture Model modelling techniques. Implementable result could obtain for the future by increasing number of speakers and speech samples and including the four Amharic accents.


Author(s):  
Amara Fethi ◽  
Fezari Mohamed

In this paper we investigate the proprieties of automatic speaker recognition (ASR) to develop a system for voice pathologies detection, where the model does not correspond to a speaker but it corresponds to group of patients who shares the same diagnostic. One of essential part in this topic is the database (described later), the samples voices (healthy and pathological) are chosen from a German database which contains many diseases, spasmodic dysphonia is proposed for this study. This problematic can be solved by statistical pattern recognition techniques where we have proposed the mel frequency cepstral coefficients (MFCC) to be modeled first, with gaussian mixture model (GMM) massively used in ASR then, they are modeled with support vector machine (SVM). The obtained results are compared in order to evaluate the more preferment classifier. The performance of each method is evaluated in a term of the accuracy, sensitivity, specificity. The best performance is obtained with 12 coefficientsMFCC, energy and second derivate along SVM with a polynomial kernel function, the classification rate is 90% for normal class and 93% for pathological class.This work is developed under MATLAB


2012 ◽  
Vol 22 (03) ◽  
pp. 1250053 ◽  
Author(s):  
AYYOOB JAFARI ◽  
FARSHAD ALMASGANJ

This paper introduces a combinational feature extraction approach to improve speech recognition systems. The main idea is to simultaneously benefit from some features obtained from nonlinear modeling applied to speech reconstructed phase space (RPS) and typical Mel frequency Cepstral coefficients (MFCCs) which have a proved role in speech recognition field. With an appropriate dimension, the reconstructed phase space of speech signal is assured to be topologically equivalent to the dynamics of the speech production system, and could therefore include information that may be absent in linear analysis approaches. In the first part of this paper the application of Lyapunov Exponents (LE) and Fractal Dimension as two usual chaotic features in speech recognition are tested and then a short discussion is made on the weakness of these features in speech recognition. In the following, a statistical modeling approach based on Gaussian mixture models (GMMs) is applied to speech RPS. A final pruned feature set is obtained by applying an efficient feature selection approach to the combination of the parameters of the GMM model and MFCC-based features. A hidden Markov model-based (HMM) speech recognition system and TIMIT speech database are used to evaluate the performance of the proposed feature set by conducting isolated and continuous speech recognition experiments. In final Continuous Speech Recognition (CSR) Experiments, using tri-phone models, 3.7% absolute phoneme recognition accuracy improvement against using MFCC features alone were obtained.


2021 ◽  
Vol 10 (4) ◽  
pp. 2310-2319
Author(s):  
Duraid Y. Mohammed ◽  
Khamis Al-Karawi ◽  
Ahmed Aljuboori

Automatic speaker recognition may achieve remarkable performance in matched training and test conditions. Conversely, results drop significantly in incompatible noisy conditions. Furthermore, feature extraction significantly affects performance. Mel-frequency cepstral coefficients MFCCs are most commonly used in this field of study. The literature has reported that the conditions for training and testing are highly correlated. Taken together, these facts support strong recommendations for using MFCC features in similar environmental conditions (train/test) for speaker recognition. However, with noise and reverberation present, MFCC performance is not reliable. To address this, we propose a new feature 'entrocy' for accurate and robust speaker recognition, which we mainly employ to support MFCC coefficients in noisy environments. Entrocy is the fourier transform of the entropy, a measure of the fluctuation of the information in sound segments over time. Entrocy features are combined with MFCCs to generate a composite feature set which is tested using the gaussian mixture model (GMM) speaker recognition method. The proposed method shows improved recognition accuracy over a range of signal-to-noise ratios.


Sign in / Sign up

Export Citation Format

Share Document